← Back to all posts

AI & Building

Composer 2.5 as My Only Coding Model: Cost, Predictability, and a Tighter Bootstrap

I run Cursor on Composer 2.5 only—not to save money alone, but to get predictable rule compliance. A tighter session bootstrap beat chasing frontier models for my workflow.

·7 min read
Agentic AIDeveloper ToolsAI MemoryEnterprise AI
Composer 2.5 as My Only Coding Model: Cost, Predictability, and a Tighter Bootstrap

Companion piece: From VS Code Copilot to Cursor: what changed in my workflow
Cost context: GitHub Copilot vs OpenRouter pricing
Memory stack: Three layers of external memory · Why deliberate file memory beats hoping agents remember
Direction habits: Training an AI is like managing an employee

Every few weeks a new frontier model appears in the model picker. The marketing promise is the same: fewer mistakes, longer context, better reasoning. For a solo builder running multiple production repos with agents as the primary implementer, model roulette has a hidden cost—not just dollars per token, but inconsistent obedience to the operating system you spent weeks writing into Obsidian and Cursor rules.

I standardized on Cursor Composer 2.5 only for implementation work (for now). Not because it wins every benchmark, but because it is a cost-effective model inside the workflow I control: fixed rules, fixed bootstrap, fixed footer contract, and file memory that does not change when the vendor ships a new badge.


The problem: expensive models do not fix a loose operating system

The failure mode looks like this:

  • Session 1 on Model A follows the footer and updates the vault.
  • Session 2 on Model B ships good code but skips Obsidian and drops the handoff block.
  • Session 3 on Model C argues with deploy rules that live in 00_Brain because it never read them.

That is not "AI quality." That is variance in instruction-following across models—and variance destroys external memory systems that depend on repeatable session end behavior.

Frontier models can brute-force a refactor. They cannot replace:

  • Operations/AI Session Bridge.md with current priority
  • memories/repo/open-loops.md mirrored to the vault
  • A ten-line footer with a required Obsidian: proof line (read ✓ / written ✓ / path)

When those are optional, every model change becomes an uncontrolled experiment.


Why a single baseline model matters for my stack

I am not anti-frontier. I am pro-controlled experiments.

Composer 2.5 is Cursor's agent-oriented model in the tier I use daily. Pinning it delivers:

GoalHow one model helps
Cost ceilingNo surprise premium-model sessions on large refactors
Predictable toneFewer style swings between replies in one thread
Rule complianceThe same .cursor/rules/*.mdc files were tuned against this model's behavior
AuditabilitySession summaries can say "Composer 2.5" without a footnote of five models

This parallels the argument in directing AI as primary engineer: productivity gains assume a stable operating model. Swapping the implementer's brain every sprint is the opposite of stable.

Cost is real but secondary. My Copilot vs OpenRouter comparison showed that token pricing only matters after you know how many tokens your workflow burns. A tighter bootstrap reduces rework tokens—which often beats a "smarter" model that wanders.


What I run (Composer 2.5 + tightened bootstrap)

Model policy: Cursor Agent for implementation; Composer 2.5 only unless a human explicitly escalates for a one-off review. No automatic model switching, no "let the agent pick."

Bootstrap order (unchanged philosophy, stricter enforcement after the Copilot → Cursor migration):

  1. 00_Brain manual prompts (Start of Session)
  2. .claude/NOTES.md + NEXT_SESSION.md
  3. memories/repo/index.md, open-loops.md, known-gotchas.md
  4. 00_Brain/Conventions/Response Footer Contract.md
  5. Project vault: Session Summaries → AI Session Bridge → relevant Features
  6. Session note created before code
  7. Confirm .cursor/rules/response-footer.mdc (alwaysApply: true)
New Agent sessionsessionStart hook(optional snapshot).cursor/rulesalwaysApplyRead vault +00_BrainComposer 2.5implementsFooter +vault writeresponse-footer.mdcsession-protocol.mdc next session
New Agent sessionsessionStart hook(optional snapshot).cursor/rulesalwaysApplyRead vault +00_BrainComposer 2.5implementsFooter +vault writeresponse-footer.mdcsession-protocol.mdc next session

The loop at the bottom is deliberate: the footer is not politeness. It is how the next Composer session knows what happened.


How tighter bootstrap produced better results (without a frontier upgrade)

After I centralized one footer spec and copied always-on Cursor rules into each repo, three behaviors improved on Composer 2.5:

  1. Session context blocks at the top of work replies—three to six lines from session summaries and a "current priority" note—reduced "what was I doing?" restarts.
  2. Mandatory proof in the footer that notes were read or updated (read ✓ / written ✓ / path) cut the fiction where the agent claimed memory ran but wrote nothing.
  3. Self-improvements with file citations (path:Lnn or None) pushed lessons into rules files instead of chat-only apologies.

None of that requires GPT-5-class reasoning. It requires repeated structure on a model that follows structure reliably at the task sizes I use: multi-file edits, note updates, scripted checks, handoff files.

Where Composer 2.5 still struggles, I document the gap in a gotchas file and narrow the task—rather than switching models mid-session and blaming the toolchain.


Tradeoffs I accept

What I give up

  • One-shot "genius refactor" sessions that occasionally land on frontier models
  • Benchmark bragging rights in social posts
  • Model-specific plugins that only run on one provider's latest release

What I keep

  • A note system that survives IDE and model churn (tool-agnostic files)
  • Comparable session handoffs across every repo that uses the same bootstrap
  • Bills that scale with work done, not with model curiosity

When I might escalate manually

  • Security review on unfamiliar dependency trees
  • One-off architecture critique where I want a second model as reviewer—not implementer

That escalation is intentional and rare. Implementation stays on Composer 2.5.


What improved after a better session start

The win was not a secret feature in Composer 2.5. It was starting every session the same way—and staying on one model long enough for that habit to stick. After I fixed session start:

AdvantageWhat it means in practice
Less re-explainingThe agent opens with a short recap of last priority and today's goal. I stop pasting "here's where I left off" into chat.
Clearer handoffsA fixed end-of-reply checklist records what changed, whether deploy is pending, and whether notes were updated.
Fewer repeat mistakesOpen loops and gotchas live in files read at start—not buried in a thread from last week.
Lower rework costOne bootstrap checklist on one model burns fewer tokens than re-teaching rules every time the model picker changes.

Optional extras help but are not required to get most of the benefit:

  • Always-on Cursor rules — Small files that repeat non-negotiables (read handoff notes, append the footer) so they survive long chats.
  • Session-start hook — A script that writes a snapshot (git status, inbox, health) into a file the agent reads first; useful for ops-heavy apps, optional for simpler repos.
  • One canonical footer spec — One markdown note defines the footer; every other doc links to it so fields do not drift.

The limiting factor was never "the model cannot code." It was sessions that skipped session start because rules lived in five places. Tightening session start on one model beat upgrading to a frontier model on a loose bootstrap.


What you can do next

  1. Pick one implementation model for two weeks. Measure footer compliance and vault writes, not vibe.
  2. Centralize your footer in one brain note; link from Cursor rules and copilot-instructions.
  3. Copy the Cursor templates from 00_Brain/Templates/cursor/ on your next retrofit—or add a user-level Cursor rule as a safety net until every repo has them.
  4. Read the migration article if you are still straddling VS Code Copilot and Cursor with one note system.

If your sessions still feel forgetful after that, add structure before you add model cost. Composer 2.5 is my proof of concept—not because it is magic, but because I built a runtime that treats memory and rules as part of shipping, not as optional documentation.