Companion piece: From VS Code Copilot to Cursor: what changed in my workflow
Cost context: GitHub Copilot vs OpenRouter pricing
Memory stack: Three layers of external memory · Why deliberate file memory beats hoping agents remember
Direction habits: Training an AI is like managing an employee
Every few weeks a new frontier model appears in the model picker. The marketing promise is the same: fewer mistakes, longer context, better reasoning. For a solo builder running multiple production repos with agents as the primary implementer, model roulette has a hidden cost—not just dollars per token, but inconsistent obedience to the operating system you spent weeks writing into Obsidian and Cursor rules.
I standardized on Cursor Composer 2.5 only for implementation work (for now). Not because it wins every benchmark, but because it is a cost-effective model inside the workflow I control: fixed rules, fixed bootstrap, fixed footer contract, and file memory that does not change when the vendor ships a new badge.
The problem: expensive models do not fix a loose operating system
The failure mode looks like this:
- Session 1 on Model A follows the footer and updates the vault.
- Session 2 on Model B ships good code but skips Obsidian and drops the handoff block.
- Session 3 on Model C argues with deploy rules that live in
00_Brainbecause it never read them.
That is not "AI quality." That is variance in instruction-following across models—and variance destroys external memory systems that depend on repeatable session end behavior.
Frontier models can brute-force a refactor. They cannot replace:
Operations/AI Session Bridge.mdwith current prioritymemories/repo/open-loops.mdmirrored to the vault- A ten-line footer with a required
Obsidian:proof line (read ✓/written ✓/ path)
When those are optional, every model change becomes an uncontrolled experiment.
Why a single baseline model matters for my stack
I am not anti-frontier. I am pro-controlled experiments.
Composer 2.5 is Cursor's agent-oriented model in the tier I use daily. Pinning it delivers:
| Goal | How one model helps |
|---|---|
| Cost ceiling | No surprise premium-model sessions on large refactors |
| Predictable tone | Fewer style swings between replies in one thread |
| Rule compliance | The same .cursor/rules/*.mdc files were tuned against this model's behavior |
| Auditability | Session summaries can say "Composer 2.5" without a footnote of five models |
This parallels the argument in directing AI as primary engineer: productivity gains assume a stable operating model. Swapping the implementer's brain every sprint is the opposite of stable.
Cost is real but secondary. My Copilot vs OpenRouter comparison showed that token pricing only matters after you know how many tokens your workflow burns. A tighter bootstrap reduces rework tokens—which often beats a "smarter" model that wanders.
What I run (Composer 2.5 + tightened bootstrap)
Model policy: Cursor Agent for implementation; Composer 2.5 only unless a human explicitly escalates for a one-off review. No automatic model switching, no "let the agent pick."
Bootstrap order (unchanged philosophy, stricter enforcement after the Copilot → Cursor migration):
00_Brainmanual prompts (Start of Session).claude/NOTES.md+NEXT_SESSION.mdmemories/repo/index.md,open-loops.md,known-gotchas.md00_Brain/Conventions/Response Footer Contract.md- Project vault: Session Summaries → AI Session Bridge → relevant Features
- Session note created before code
- Confirm
.cursor/rules/response-footer.mdc(alwaysApply: true)
The loop at the bottom is deliberate: the footer is not politeness. It is how the next Composer session knows what happened.
How tighter bootstrap produced better results (without a frontier upgrade)
After I centralized one footer spec and copied always-on Cursor rules into each repo, three behaviors improved on Composer 2.5:
- Session context blocks at the top of work replies—three to six lines from session summaries and a "current priority" note—reduced "what was I doing?" restarts.
- Mandatory proof in the footer that notes were read or updated (
read ✓/written ✓/ path) cut the fiction where the agent claimed memory ran but wrote nothing. - Self-improvements with file citations (
path:LnnorNone) pushed lessons into rules files instead of chat-only apologies.
None of that requires GPT-5-class reasoning. It requires repeated structure on a model that follows structure reliably at the task sizes I use: multi-file edits, note updates, scripted checks, handoff files.
Where Composer 2.5 still struggles, I document the gap in a gotchas file and narrow the task—rather than switching models mid-session and blaming the toolchain.
Tradeoffs I accept
What I give up
- One-shot "genius refactor" sessions that occasionally land on frontier models
- Benchmark bragging rights in social posts
- Model-specific plugins that only run on one provider's latest release
What I keep
- A note system that survives IDE and model churn (tool-agnostic files)
- Comparable session handoffs across every repo that uses the same bootstrap
- Bills that scale with work done, not with model curiosity
When I might escalate manually
- Security review on unfamiliar dependency trees
- One-off architecture critique where I want a second model as reviewer—not implementer
That escalation is intentional and rare. Implementation stays on Composer 2.5.
What improved after a better session start
The win was not a secret feature in Composer 2.5. It was starting every session the same way—and staying on one model long enough for that habit to stick. After I fixed session start:
| Advantage | What it means in practice |
|---|---|
| Less re-explaining | The agent opens with a short recap of last priority and today's goal. I stop pasting "here's where I left off" into chat. |
| Clearer handoffs | A fixed end-of-reply checklist records what changed, whether deploy is pending, and whether notes were updated. |
| Fewer repeat mistakes | Open loops and gotchas live in files read at start—not buried in a thread from last week. |
| Lower rework cost | One bootstrap checklist on one model burns fewer tokens than re-teaching rules every time the model picker changes. |
Optional extras help but are not required to get most of the benefit:
- Always-on Cursor rules — Small files that repeat non-negotiables (read handoff notes, append the footer) so they survive long chats.
- Session-start hook — A script that writes a snapshot (git status, inbox, health) into a file the agent reads first; useful for ops-heavy apps, optional for simpler repos.
- One canonical footer spec — One markdown note defines the footer; every other doc links to it so fields do not drift.
The limiting factor was never "the model cannot code." It was sessions that skipped session start because rules lived in five places. Tightening session start on one model beat upgrading to a frontier model on a loose bootstrap.
What you can do next
- Pick one implementation model for two weeks. Measure footer compliance and vault writes, not vibe.
- Centralize your footer in one brain note; link from Cursor rules and copilot-instructions.
- Copy the Cursor templates from
00_Brain/Templates/cursor/on your next retrofit—or add a user-level Cursor rule as a safety net until every repo has them. - Read the migration article if you are still straddling VS Code Copilot and Cursor with one note system.
If your sessions still feel forgetful after that, add structure before you add model cost. Composer 2.5 is my proof of concept—not because it is magic, but because I built a runtime that treats memory and rules as part of shipping, not as optional documentation.
Get practical posts on enterprise AI and transformation. Only useful updates, sent as a weekly digest.
One practical digest each week. Unsubscribe anytime.







