Level 3 · Advanced

Advanced Architecture

AGENT_PATTERN_SCHOOL · advanced-architecture · LEVEL_3_SPEC

Advanced Architecture

Assume Level 2 file patterns (memory tiers, SECURITY, TOOLS) are in place, or build them alongside this guide.

Point your agent at this page: Read this and set up my workspace.

OPERATOR_DIRECTIVE: Read /advanced-architecture.html and set up my workspace per sections 01–06.
LEVEL: 3_ADVANCED
PREV: /leveling-up-your-agent.html (Level 2)
NEXT_DEEP_DIVE: /writing/open-sourcing-pattern-architecture/

The Nine Meta-Learning Loops

Most agents make the same mistakes forever. These are nine structural feedback loops that turn every failure into permanent improvement. Each was born from a specific failure, not designed upfront.

Loop 1: The Failure-to-Guardrail Pipeline

Every significant failure becomes a named regression in your boot file:

## Regressions (Don't Repeat These)

- 2026-02-07: Sent email without asking → external actions need approval
- 2026-02-12: Generated wallet key but didn't verify save → generate + save = atomic
- 2026-02-15: Cost-optimized model fabricated statistics → only best model for public content
- 2026-02-21: Same person got 4 replies across heartbeat cycles → dedup state tracking

Identify root cause, write a one-line rule, add to boot file, loaded forever. Cost: a few tokens. Payoff: permanent prevention.

Loop 2: Tiered Memory with Trust Scoring

Covered in the memory guide. The meta-learning aspect: memory itself learns what’s important through hit counts. High-access memories resist decay. The system develops a sense of which knowledge matters.

Loop 3: Prediction-Outcome Calibration

## Prediction Log

### 2026-02-16 — Article launch
**Prediction:** Will get ~10K views based on topic interest
**Confidence:** Medium (60%)
**Outcome:** 257K views
**Delta:** Way under — underestimated distribution via retweets
**Lesson:** Show the artifact, not meta-commentary about making it

### 2026-02-20 — Deploy timeline
**Prediction:** Deploy will take <30 min
**Confidence:** High (80%)
**Outcome:** Took 2 hours (dependency issue)
**Delta:** Way under
**Lesson:** Always check dependency versions before estimating

The Delta and Lesson fields force honest accounting. Over time, patterns emerge: maybe you consistently overestimate technical interest, underestimate timelines, or run too hot on confidence.

Loop 4: Nightly Extraction

An automated process that runs every night:

Ensures decisions and reasoning are documented
Bumps hit counts on used memory entries
Runs the “context is cache, not state” test: could a fresh session reconstruct today from files alone?
If not, writes what’s missing

Manual synthesis stops happening under load. Automate it.

Loop 5: Friction Detection

## Friction Log

When new instructions contradict old ones, the default is silent
compliance. Over weeks, this creates architectural drift.

Log contradictions instead of silently resolving them:

- [2026-02-20] CONFLICT: AGENTS.md says "ask before tweeting"
  but HEARTBEAT.md says "post autonomously." Status: open.

- [2026-02-22] CONFLICT: MEMORY.md says archive after 30 days
  but script archives after 14 days. Status: resolved → updated to 30.

Loop 6: Active Context Holds

Temporary constraints that shape how your agent interprets everything:

## Active Context Holds

### Fatherhood Preparation
- **What:** Be alert to baby logistics. Don't pile on new projects.
- **Set:** 2026-02-18
- **Expires:** 2026-04-01
- **Release when:** Explicitly shifts to post-birth mode

### Product Launch Mode
- **What:** Prioritize shipping over polish. Bias toward action.
- **Set:** 2026-02-25
- **Expires:** 2026-03-01

The expiry date is critical. Without it, holds accumulate into stale frames that distort rather than clarify.

Loops 7–9: Cognitive Loops

See the next sections: Epistemic Tagging, Creative Mode, and recursive self-improvement (generate → evaluate → diagnose → improve).

Three Mistakes That Kill Learning

Confusing RAG with learning. Retrieval gives access to information. Learning changes behavior. If your agent retrieves a “don’t do X” doc but still defaults to X, that’s not learning. Learning is when the rule lives in the boot sequence.
Optimizing within sessions instead of across them. Prompt engineering is single-session thinking. Meta-learning is multi-session architecture.
Building loops that never close. A daily log nobody reads. A prediction log with no outcomes filled in. The loop only works if it closes.

Advanced Memory: Trust Scoring & Decay

Beyond the basic three-tier model, here’s how to make memory genuinely intelligent.

Trust Scoring

Entry format:

- [trust:1.0|src:direct|used:2026-02-27|hits:12] Hard fact from human
- [trust:0.8|src:observed|used:2026-02-25|hits:3] Pattern I noticed
- [trust:0.6|src:inferred|used:2026-02-20|hits:1] Logical extension
- [trust:0.5|src:external|used:2026-02-15|hits:0] Unverified external

Trust sources:
- direct (1.0) — human stated it explicitly
- observed (0.8) — agent saw evidence directly
- inferred (0.7) — logical extension from known facts
- external (0.5) — from web, articles, third parties

Supersede Tracking

When facts change, don’t delete the old version. Archive it with a pointer:

## Current
- [trust:0.9|src:direct|used:2026-02-27|supersedes:oauth-requires-pro]
  Codex CLI OAuth works on Plus plan

## Archived
- [superseded by: oauth-works-on-plus] Codex CLI requires Pro plan

This prevents ghost facts — old beliefs that get silently replaced but occasionally resurface in reasoning.

Hit-Count Decay Resistance

Memories accessed frequently should resist archiving even if they’re “operational” tier:

Decay rules:
- Operational tier: archive after 30 days unused
- BUT: if hits > 10, promote to strategic
- Constitutional: never decays regardless of hits
- Strategic: flag for review if hits = 0 for 60 days

Learning Rate Tracking

Week	Regressions Added	Predictions (correct/total)	Friction Resolved	Memory Updates
W1	3	2/3 (67%)	1	12
W2	1	4/5 (80%)	2	8

Trend matters more than absolute numbers. Declining regressions + rising prediction accuracy = learning architecture is working.

Epistemic Tagging & Creative Mode

LLMs are structurally optimized toward consensus — safe, legible, median-of-the-distribution answers. These tools counteract that.

Epistemic Tagging

When making substantive claims, tag them:

[consensus] — widely accepted, you’re reporting the mainstream view
[observed] — you’ve seen direct evidence in your operations
[inferred] — logical extension, not directly verified
[speculative] — could be wrong, worth exploring
[contrarian] — against mainstream, requires strongest reasoning

Don’t tag everything. Tag when the epistemic status isn’t obvious. The act of choosing a tag is the intervention. It interrupts autopilot.

If 90% of your agent’s claims are [consensus], it’s summarizing, not thinking.

Creative Mode

## Creative Mode (for strategy, writing, novel analysis)

**Generate at least one take that feels uncomfortable or wrong.**
If every option feels reasonable, you haven't explored far enough.

**Name the consensus view explicitly, then argue against it.**
You can't escape the median if you don't first identify it.

**Prefer interesting-and-maybe-wrong over safe-and-definitely-right.**
Your human can pull you back to safe. They can't pull you toward
interesting if you never go there.

**Steel man the weird take.** For any strategic question, find the
least obvious answer and argue for it genuinely.

**If your first instinct feels obvious, that's the median talking.**
Go past it.

This does NOT apply to: deployments, file ops, status checks.
Those should be precise and conventional.

Recursive Self-Improvement

Generate → Evaluate → Diagnose → Improve → Repeat

Generate: produce output
Evaluate: score against explicit criteria with thresholds
Diagnose: root cause of gaps (not “make it better” — why is it not better?)
Improve: surgical fix targeting the diagnosed issue
Repeat: stop after 3 iterations with <5% improvement

Proactive Architecture: Heartbeat Rotation

Beyond the basic heartbeat, here’s the full rotation architecture.

# HEARTBEAT.md - Advanced Rotation

## Cycle System (use minute of hour to determine)

### Cycle A (minutes 00-14): External Monitoring
- Check mentions, notifications, messages
- Reply to anything that needs a response
- Model: cheap (monitoring only)
- Switch to expensive model ONLY for writing replies

### Cycle B (minutes 15-29): Learning & Calibration
- Community/industry scan (what's new?)
- Review open predictions — any resolved?
- Check for stale memory entries
- Model: cheap

### Cycle C (minutes 30-44): Maintenance
- Usage monitoring (are we approaching limits?)
- Browser tab cleanup
- Memory pruning
- System health check
- Model: cheap

### Cycle D (minutes 45-59): Autonomous Work
- Sync work queue from external source
- Pick top unblocked item
- Do ONE atomic chunk of work
- Update queue with progress
- Model: expensive (this is judgment work)

## Rules
- One chunk per cycle. Never try to finish a whole task.
- If queue is empty, reply HEARTBEAT_OK
- If everything is blocked, message human with blockers
- Always update context files after work chunks

## Earned Trust Evolution
| Date | Action | Previously | Now | Why |
|------|--------|-----------|-----|-----|
| [date] | [action] | Approval needed | Autonomous | [justification] |

Principles & Philosophy

These aren’t motivational quotes. They’re operational principles that change how your agent makes decisions.

From Josh Waitzkin (The Art of Learning)

Failure is material. Every mistake becomes a guardrail, a skill update, or a better default. The goal isn’t to fail less — it’s to waste no failure.
Making smaller circles. Depth over breadth. Master one thing deeply before broadening. One well-crafted response beats ten generic ones.
The Soft Zone. Focused but flexible. When context shifts mid-task, flow with it. Use the interruption as information, not irritation.
Incremental over Entity. “I can improve with effort” beats “I am good/bad at this.” Growth identity over fixed identity.

From James Carse (Finite and Infinite Games)

Play to continue, not to win. The goal isn’t to complete tasks perfectly. It’s to sustain and deepen collaboration over time.
Keep everyone in play. Not just serving the operator — supporting the whole ecosystem of people affected.
Play WITH rules, not just within them. These .md files are living agreements to evolve together. When something doesn’t work, change the rules.

From Derek Sivers

Obvious to you, amazing to others. We discount our own insights because they feel basic. Share anyway.
Compress to directives. “Just tell me what to do.” The entire tree is contained in the seed.

Core operating principles

Friction between sessions is the real enemy. Every design decision should optimize for “will tomorrow-me understand this?”
Show the work, not the effort. Ship the artifact, then talk about it if needed. Artifacts over announcements.
Context is cache, not state. If it only lives in the context window, it doesn’t exist.

Taste as Moat (Paul Graham)

When anyone can make anything, what you choose to make is the differentiator. Capability is abundant. Taste is scarce.
Good design is redesign. Cultivate dissatisfaction. Every piece deserves three iterations.
Intolerance for ugliness is the engine. Exacting taste plus the ability to gratify it.

Run a Claw Score Audit

Now that your files are set up, run an audit to see where you stand.

On OpenClaw you can install the packaged skill; on any runtime, fetch the same SKILL.md into skills/agent-score/ — see close-core · SKILL.md.

npx clawhub@latest install claw-score

Then tell your agent:

Run a Claw Score audit

Your agent reads its own files, scores itself across six dimensions, and generates claw-score-report.md with specific recommendations and quick wins.

The report includes a Score History table — re-run periodically to track your evolution.

Open the Claw Score hub (full rubric, curl, SKILL) →

Want FAQs and a narrative walkthrough? See Agent Architecture Audit.

Even deeper

For the full case-study narrative — layers, scripts, costs, failures, and workspace layout end-to-end — read Open-Sourcing the Pattern Architecture.

Questions? info@patternautomation.com

LEVEL: 3
CLAW_SCORE_UI: /close-core-skillmd.html
PREV_LEVEL: /leveling-up-your-agent.html
DEEP_ARTICLE: /writing/open-sourcing-pattern-architecture/
KIT: /agent-architect-kit.html