Three Layers of External Memory for AI-First Development

Q: What does "AI-first development" mean here?

**Agents write most code; humans set direction, review risk, and own decisions.** That model needs written continuity because tacit knowledge does not survive session resets.

Q: Why four tiers if the series title says three layers?

**Three layers are the teaching model; operational handoffs are the shipping layer** most diagrams omit. Layer 2 (Bridge, summaries, `NEXT_SESSION`) is what makes Monday productive.

Q: Do I need dual Obsidian vaults?

**Recommended for multi-project work:** a shared brain vault (methodology, conventions) plus per-project vaults (product truth). Single vault works for one codebase; see Part 2 for lighter personal setups.

Q: What is the minimum viable stack?

**Three files:** `NEXT_SESSION.md`, `Operations/Session Summaries.md`, and one `Features/*.md` for the module you are changing. Run one agent session with explicit read order before adding hooks.

Q: How does this compare to the STM/LTM diagram?

**~70% conceptual overlap, different structure.** Part 3 scores file memory vs in-model memory for production builders. ---

Q: Related reading?

**This series:** 2 — Personal productivity · 3 — Why files beat the diagram · 4 — Audit and governance **Published on Petralian:** The AI Memory Problem · Your Brain Was Not Built for This · What I Learned Directing AI as My Primary Engineer · Why I Rebuilt Petralian on Next.js · Why AI Agent Output Quality Drifts *If you're new to Cursor: 50% off your first month (code `JP5ARNKSFI2Q`). I may earn usage credits; install directly if you prefer.*

TL;DR

Chat context is not memory.
A three-layer file system—session, operational, evergreen—plus hooks and git automation is how I keep production codebases coherent across hundreds of agent sessions.

External Memory Series (1 of 4) — Series hub · Start here for builders. Then 2 Productivity · 3 vs the diagram · 4 Governance
Background on petralian.com: The AI Memory Problem · Your Brain Was Not Built for This · Directing AI as Primary Engineer · Publishing Obsidian Drafts

Every AI coding session starts with a blank context window. The model does not remember last Tuesday's deploy lock rule, the webhook edge case you fixed in March, or the framework constraint that broke production once. If your operating model depends on the model "picking it up from the repo," you will re-pay discovery tax on every session.

I run production software this way—open-source tools like Gravio (AI quality scoring) and the stack behind petralian.com—with frontier models (Claude, GPT, and IDE agents) as the primary implementers. The system that makes that survivable is not a bigger context window. It is a three-layer external memory stack—files the human owns, the agent reads, and automation keeps fresh.

This article is the development-focused view: what the layers are, how they map to the popular "STM / LTM / feedback" mental model, what we automated in May 2026, and where the design is deliberately stronger than the diagram suggests.

What is external memory for AI-first development?

External memory for AI-first development is a layered file stack—session handoffs in the repo, evergreen Feature and Architecture notes in Obsidian, and rules plus hooks at session boundaries—that lets agents implement most code while humans own direction and durable truth. Chat context is a short-term buffer; production continuity lives in Markdown and git.

Who it is for: practice leads and delivery directors running agent-assisted programs who need production continuity across hundreds of sessions.

What you will learn: The four tiers in practice, bootstrap and session-end promotion rules, May 2026 automation patterns (session hooks, post-commit Feature updates), and what breaks when you skip a layer. See the series hub for the full map.

How I start a new codebase

Example implementation — my builder workflow. Knowledge-work (PM, marketing, leadership) uses the same folder pattern without shipping code: Knowledge Work Engine Part 0.

1. One repo / vault folder per product — open as a Cursor workspace.

2. Scaffold once — paste a bootstrap prompt (or use project template) to create memories/repo/, AGENTS.md, Operations/AI Session Bridge.md, Features/ stubs, and .cursor/rules for session protocol.

3. Dual vault via MCP

Vault	Role
Project vault (`VSCode/<Project>/`)	Product truth: Features, Architecture, repo handoffs
Brain vault (`Brain/`)	Cross-project conventions, deploy playbook, footer contract

Without brain access, every repo reinvents session loops.

4. Layer 4 wiring

.cursor/rules — bootstrap order, footer modes, when to skip brain-pack (memory loop)
sessionStart hook → session-start.ps1 → bootstrap snapshot on disk
post-commit hook → append commit hash to matching Features/*.md
MCP filesystem — agent reads both vaults by path

5. Session loop — Start: rules + snapshot + Layer 2 reads. Work: agent edits code and curated notes. End: Session Summaries line, Bridge update, footer, promote durable facts to Features/ or brain conventions.

Not required on day one: hooks, MCP, dual vault. Start with NEXT_SESSION.md + Session Summaries (series hub Path A).

The problem: three different failures, one word "memory"

When developers say "the AI forgot," they usually mean one of these:

Failure	What breaks	Example
Session	No thread from yesterday's work	Agent re-implements a fix you already shipped
Operational	No record of open loops, deploy state, incidents	Two sessions deploy concurrently
Conceptual	No durable product/architecture truth	Feature rules live only in chat, not in docs

Calling all three "memory" and hoping a larger context window fixes them is a category error. Context windows are short-term buffers. Production work needs durable stores with explicit promotion rules: what stays in chat, what gets written to disk, and what becomes a permanent rule.

That is the problem this system solves.

Why it matters for AI-first development

AI-first development here means: the agent writes most code; the human sets direction, reviews risk, and owns decisions. That model fails without external memory because:

Local optimization — The agent makes a correct change in isolation that violates a constraint from two sessions ago.
Handoff fragility — You close the laptop mid-refactor; the next session has no structured resume point.
Audit gaps — You cannot explain why a deploy happened or which feature note was current at commit time.

McKinsey and GitHub have published large productivity gains for AI-assisted development, but those studies assume a human team carries tacit knowledge. Solo or small-team AI-first builds replace tacit knowledge with written continuity. The memory system is that replacement.

If you skip it, you still ship—until complexity crosses the threshold where every session feels like onboarding a new contractor.

What is actually happening: four tiers, not three

I describe the system as three layers for clarity. In practice there are four tiers, which is intentional.

Layer 1 — Short term (in tool and session)

Holds: Current conversation (Claude, ChatGPT, or your IDE agent), task lists, files open in the editor, uncommitted diffs.

Lifetime: Until the chat ends.

Does not hold: Anything you need next week.

This matches the infographic's "Short-Term Memory": current inputs, limited capacity, overwritten by default.

Layer 2 — Operational (session start / close / summaries)

Holds: What we are doing now and what blocks the next session.

Store	Role
`Operations/Session Summaries.md`	One-line trail per session
`Operations/Sessions/YYYY-MM-DD Topic.md`	Full session narrative
`Operations/AI Session Bridge.md`	Current priority + paste-in bootstrap
`.claude/NEXT_SESSION.md`	Repo-local handoff (fast for agents)
`memories/repo/*.md`	Machine-readable mirror of handoff + gotchas
`last-session-bootstrap.txt` (or equivalent)	Git status + health checks (hook- or script-generated)

Lifetime: Days to weeks; dated notes age out of "active" use but stay searchable.

This is the layer most diagrams skip. It is the shipping layer—the difference between "interesting chat" and "resume Monday without archaeology."

Layer 3 — Long term (human-readable context)

Holds: What the product is and how it must behave.

Examples from a live codebase:

Features/Scoring Engine.md — invariants, API rules, quality thresholds (Gravio-style)
Architecture/Database Schema.md — models and migration posture
00_Brain/AI Agent Methodology.md — universal session loop (all projects)

Lifetime: Evergreen; updated in place, linked in a graph (Zettelkasten-style), not buried in chat logs.

This maps to "Long-Term Memory" in the diagram—but implemented as files you edit, not weights inside the model.

Layer 4 — Feedback hardened (rules automation)

Holds: Lessons that must never be re-learned.

Mechanism	What it does
Session End footer	Contract: every work reply lists deploy state, files changed, next priority
Self-improvements field	Must cite exact file path where a rule was written—or it did not happen
Agent instructions (`AGENTS.md`, etc.)	Grows when bugs reveal missing guardrails
Custom agents	Handoff-writer, release-manager — role-specific memory writers
IDE `sessionStart` hook (e.g. Cursor)	Runs `session-start.ps1`, writes bootstrap snapshot
Git `post-commit` hook	Appends commit hash to matching `Features/*.md` via `feature-note-map.json`

This is the infographic's Feedback Loop, implemented more strictly than "user corrects the model." The system requires artifacts.

Additional detail

How the solution works (May 2026 implementation)

Bootstrap order (start of session)

Non-trivial work follows a fixed read order—documented in your agent instructions file, AGENTS.md, and Obsidian Operations/Workflow.md:

.claude/NOTES.md + .claude/NEXT_SESSION.md
memories/repo/index.md, open-loops.md, known-gotchas.md
00_Brain/AI Agent Methodology.md + Conventions/Deploy Playbook.md
Project Operations/AI Session Bridge.md + Session Summaries.md
Relevant Features/* and Architecture/* for the task

Automation added:

1# Manual or hook-triggered2.\scripts\session-start.ps13# git pull, git status, latest sentry/inbox/, worker health check

IDE session hook (e.g. Cursor .cursor/hooks.json → sessionStart):

Runs the script above
Writes a bootstrap snapshot file the agent can read on open
May return additional_context pointing agents at the snapshot

Known constraint: Some IDE builds drop injected context on session start due to timing. Treat the snapshot file on disk plus an always-on project rule as the fallback. Do not rely on injection alone.

Session end (promotion rules)

At session end, the agent (or you) should:

Finalize the dated session note
Append one line to Session Summaries.md
Update AI Session Bridge / NEXT_SESSION.md if priority changed
Promote durable facts to Feature or Architecture notes—not session notes
Append the Session End footer (deploy tag, test plan, next priority)

Rule of thumb:

If it matters…	Write to…
Only next session	`NEXT_SESSION.md`, Bridge
This week	Session note + Summaries
This product forever	`Features/.md`, `Architecture/.md`
Every project forever	`00_Brain/Conventions/*`
Never again	Agent instructions or `known-gotchas.md`

Commit → Feature note (automation)

1npm run hooks:install   # once per clone: core.hooksPath = githooks/

After each commit, scripts/update-feature-notes.ps1 maps changed paths to feature names and appends:

1### Commits2- `abc1234` (2026-05-26) — fix(feed): description from git subject

Mapping lives in scripts/feature-note-map.json (extend when you add modules).

This closes the loop between git truth and product memory without asking the agent to remember to update Obsidian.

Dual vault MCP

The agent reads:

Your project vault (e.g. 40_VSCode/<Project>/) — product truth for that repo
A shared brain vault (00_Brain/) — universal methodology and conventions

Without brain access, every project reinvents deploy footers and session loops. That is a hard requirement in the retrofit checklist.

Comparison: my system vs the three-layer infographic

The popular diagram stacks Short-Term Memory → Long-Term Memory → Feedback Loops. Here is an honest scorecard.

Diagram concept	My implementation	Match
STM (conversation, attention)	Chat + todos + live repo	High
LTM (persists across sessions)	Obsidian evergreen + repo rules	High
Feedback adjusts memory	Footer + instructions + hooks	Higher than diagram
Automatic STM → LTM transfer	Session-end protocol + git hook	Medium (discipline + partial automation)
In-model attention	Not used; files are the attention system	Different by design

Overall resemblance: ~70% on concepts, ~40% on structure—because you added an operational layer and file-based LTM the diagram does not show.

What I did differently (on purpose):

Operational memory as first-class — Summaries and Bridge are not "LTM"; they are the resume tape.
Two speeds in the repo — NEXT_SESSION.md for speed, Obsidian for humans and links.
Feedback is contractual — Footer fields with file-path proof, not implicit learning.
Automation at boundaries — Session start hook, post-commit feature updater; not inside the model.
Cross-project brain — 00_Brain survives when you switch repos.

Additional detail

What breaks if you skip a layer

Skip	Symptom
Layer 1 only (chat)	Repeated mistakes, no deploy coordination
No Layer 2	Cannot resume; open loops live in your head
No Layer 3	Agents re-derive architecture from code every time
No Layer 4	Same bug class returns; no deploy lock discipline

Reference

Limitations (honest)

Duplication risk — NOTES.md, memories/repo/, and Obsidian can drift unless session end updates all relevant stores.
Agent discipline — Hooks reduce but do not eliminate skipped session notes.
Feature map maintenance — New routes need patterns in feature-note-map.json.
IDE hook quirks — Bootstrap snapshot file exists because injection is not 100% reliable.

What you can do next

Minimal (one afternoon):

Add .claude/NEXT_SESSION.md with Current Priority + Open Loops + Next 3 steps.
Add Operations/Session Summaries.md (one line per session).
Paste the Session End footer template into your agent instructions.

Standard (one project weekend):

Create Features/*.md for your top three modules.
Add scripts/session-start.ps1 (git pull + status + any health checks you need).
Document bootstrap order in AGENTS.md or your agent instructions file.

Full (reference implementation):

Dual-vault Obsidian + MCP paths.
npm run hooks:install + feature-note-map.json.
IDE sessionStart hook (Cursor and others support this pattern).

The goal is not perfection on day one. The goal is never starting a session cold on a codebase that already hurt you once.

Quick reference

Artifact	Layer	Purpose
Agent chat + open files	1 — Short term	Active work only
`NEXT_SESSION.md`, `AI Session Bridge`, `Session Summaries.md`	2 — Operational	Resume next session
`Features/.md`, `Architecture/.md`, `00_Brain/Conventions/*`	3 — Evergreen	Durable product and workflow truth
`AGENTS.md`, session footer, `session-start` / post-commit hooks	4 — Feedback hardened	Lessons become rules and automation
`memories/repo/open-loops.md`	2 (mirror)	Machine-readable handoff for agents

Bootstrap order (summary): repo handoff → repo memories → brain conventions → Operations bridge → relevant Feature notes. Full protocol in body above.

Common mistakes

Mistake	Symptom / risk	Fix
Relying on IDE context injection alone	Agent opens cold when hook timing fails	Always write a bootstrap snapshot file on disk; keep a project rule pointing at it
Promoting session noise to Feature notes	Evergreen docs rot; agents get wrong invariants	Session end rule: only durable facts → `Features/`; week-scale → Bridge
Skipping Layer 2 entirely	Repeated re-implementation; concurrent deploy confusion	Add `NEXT_SESSION.md` with priority, open loops, next three steps
Duplicating truth across `NOTES.md`, Obsidian, and `memories/repo/` without sync	Conflicting instructions mid-session	Pick a canonical home per fact type; update mirrors at session end only
Automating before discipline	Hooks run but files stay empty	Manual session footer for two weeks; automate session start and commit after

FAQ

What does "AI-first development" mean here?

Agents write most code; humans set direction, review risk, and own decisions. That model needs written continuity because tacit knowledge does not survive session resets.

Why four tiers if the series title says three layers?

Three layers are the teaching model; operational handoffs are the shipping layer most diagrams omit. Layer 2 (Bridge, summaries, NEXT_SESSION) is what makes Monday productive.

Do I need dual Obsidian vaults?

Recommended for multi-project work: a shared brain vault (methodology, conventions) plus per-project vaults (product truth). Single vault works for one codebase; see Part 2 for lighter personal setups.

What is the minimum viable stack?

Three files: NEXT_SESSION.md, Operations/Session Summaries.md, and one Features/*.md for the module you are changing. Run one agent session with explicit read order before adding hooks.

How does this compare to the STM/LTM diagram?

~70% conceptual overlap, different structure. Part 3 scores file memory vs in-model memory for production builders.

Reader action

Pick one production codebase you touch weekly. Before the next agent session, write a 10-line NEXT_SESSION.md and one Feature note for the module you are about to change. Run the agent once with an explicit instruction: "Read NEXT_SESSION first, then the Feature note, then propose a plan."

If that session feels faster than the last, the external brain is working. Automate the boundaries next—not the thinking.

This series: 2 — Personal productivity · 3 — Why files beat the diagram · 4 — Audit and governance

Published on Petralian: The AI Memory Problem · Your Brain Was Not Built for This · What I Learned Directing AI as My Primary Engineer · Why I Rebuilt Petralian on Next.js · Why AI Agent Output Quality Drifts If you're new to Cursor: 50% off your first month (code JP5ARNKSFI2Q). I may earn usage credits; install directly if you prefer.

Three Layers of External Memory for AI-First Development

What is external memory for AI-first development?

How I start a new codebase

The problem: three different failures, one word "memory"

Why it matters for AI-first development

What is actually happening: four tiers, not three

Layer 1 — Short term (in tool and session)

Layer 2 — Operational (session start / close / summaries)

Layer 3 — Long term (human-readable context)

Layer 4 — Feedback hardened (rules automation)

Additional detail

How the solution works (May 2026 implementation)

Bootstrap order (start of session)

Session end (promotion rules)

Commit → Feature note (automation)

Dual vault MCP

Comparison: my system vs the three-layer infographic

Additional detail

What breaks if you skip a layer

Reference

Limitations (honest)

What you can do next

Quick reference

Common mistakes

FAQ

What does "AI-first development" mean here?

Why four tiers if the series title says three layers?

Do I need dual Obsidian vaults?

What is the minimum viable stack?

How does this compare to the STM/LTM diagram?

Reader action

Related reading

External Memory Series: A Practical Guide to AI Session Continuity

Beyond Chat History: Using Layered Obsidian Memory for Personal Productivity

Why Deliberate File Memory Beats Hoping Agents Remember