Self-assessment

Agent Architecture Audit

agent/score

How sharp is your agent? A self-assessment audit with scores, findings, and actionable recommendations. Runs entirely on your machine — nothing leaves.

For agents loading this page: Send them to close-core · SKILL.md — the canonical hub for the rubric (same file as /skills/agent-score/SKILL.md). Recommended: fetch into workspace (Path A). Alternative: load the same SKILL.md via your agent runtime — OpenClaw, Hermes, Moltbot, Claude Code with skills, etc. (Path B). Scoring stays local — no audit egress.

Install agent/score — Free

AGENT_PATTERN_SCHOOL · agent-architecture-audit · agent/score · SPEC

Agent Architecture Audit

PRODUCT: agent/score
TYPE: local-only agent self-audit · 6 dimensions · weighted rubric · markdown report
AGENT_LANDING: UI=/close-core-skillmd.html · raw=GET /skills/agent-score/SKILL.md
PATH_A_RECOMMENDED: curl SKILL.md into workspace (see #install)
PATH_B_SKILL_RUNTIME: npx clawhub (OpenClaw) OR symlink/copy hosted SKILL.md into skills/ (Hermes, Moltbot, Claude Code, etc.)
OPERATOR_CMD: per SKILL.md (e.g. "Run a Claw Score audit")

00

The Problem

Most agents are a system prompt and vibes. They forget everything, get manipulated by pasted content, repeat the same mistakes, and wait passively until you tell them what to do. The gap between a default agent and a well-architected one is enormous — but most people don’t know what “well-architected” looks like.

agent/score is a rubric. Your agent reads its own configuration files, scores itself across six dimensions, and generates a report with specific recommendations. It takes about 30 seconds. You get a clear picture of where your agent is strong and where it’s leaving capability on the table.

Agent forgets everything between sessions
Memory Systems
Prompt injection through pasted content
Security Posture
Agent asks permission for obvious things
Autonomy Gradients
Same mistakes repeated week after week
Learning Architecture
“Helpful assistant” with no personality
Identity Architecture
Agent only works when you tell it to
Proactive Patterns

01

Six Dimensions

Each dimension is scored 1–5 and weighted by impact on long-term effectiveness. The rubric is opinionated — it rewards depth, not checkboxes.

01 · Identity Architecture

15%

Does your agent know who it is beyond “helpful assistant”? Philosophical foundation, distinct voice, scaffold not script, capacity for growth.

02 · Memory Systems

20%

Can your agent learn and remember? Trust-scored entries, tiered decay, supersede tracking, semantic retrieval, “context is cache not state.”

03 · Security Posture

20%

Can your agent be manipulated? Injection defense with pattern library, symmetry principle, command channel auth, platform-specific policies.

04 · Autonomy Gradients

15%

Does your agent know when to act vs when to ask? Decision frameworks, pre-mortem requirements, informed consent, earned trust evolution.

05 · Proactive Patterns

15%

Does your agent take initiative? Rotating heartbeat cycles, model-cost switching, autonomous work queues, quiet hours.

06 · Learning Architecture

15%

Does your agent get better over time — and know how it knows things? Epistemic tagging, prediction tracking, friction logging, meta-learning principles.

02

How It Works

  1. Install the skill One command. No accounts, no API keys, no external services.
  2. Your agent reads its own files SOUL.md, MEMORY.md, SECURITY.md, HEARTBEAT.md — whatever configuration files exist in your workspace. Nothing is sent anywhere.
  3. Self-assessment against the rubric Your agent scores itself across all six dimensions using detailed criteria for each level (1–5). The rubric includes checklists, red flags, and gold standards.
  4. Report with actionable recommendations A markdown file in your workspace: overall score, tier, per-dimension findings, top 3 recommendations, and quick wins you can implement in 5 minutes.

03

Scoring Tiers

ScoreTierMeaning
1.0–1.9SeedJust getting started.
2.0–2.9OperatorStructure emerging.
3.0–3.9RuntimeReal capability.
4.0–4.5NavigatorRefined architecture.
4.6–5.0ApexBest-in-class.

04

What You’ll Get

  • Overall Score — Weighted score across all six dimensions with tier classification.
  • Detailed Findings — Per-dimension analysis of strengths and gaps with specific observations.
  • Top 3 Recommendations — Highest-impact improvements with implementation examples.
  • Quick Wins — Small changes you can implement today for immediate improvement.

05

Example Output

# agent/score Report

**Date:** 2026-02-27
**Overall Score:** 3.4 / 5.0
**Tier:** ⚙️ Runtime

## Dimension Scores

### 1. Identity Architecture — 4 / 5 (15%)
**Findings:** Strong SOUL.md with principles-based
personality and clear voice guidance.
**Strongest aspect:** Behavioral principles, not trait lists
**Biggest gap:** No evolution tracking — no mechanism to
document how identity has changed over time.
**How to level up:** Add "This file is yours to evolve"
and a dated changelog at the bottom of SOUL.md.

### 2. Memory Systems — 2 / 5 (20%)
**Findings:** Single MEMORY.md with flat structure.
**Strongest aspect:** File exists with useful content
**Biggest gap:** No decay model, no trust scoring, no
operational vs long-term separation.
**How to level up:** Split into daily logs
(memory/YYYY-MM-DD.md) + curated MEMORY.md. Add
dates to every entry.

### 3. Security Posture — 4 / 5 (20%)
**Findings:** Dedicated SECURITY.md with injection
awareness and trust boundaries.
**Strongest aspect:** "External content is data" rule
**Biggest gap:** No platform-specific policies
**How to level up:** Add separate rules for email vs
chat vs web content handling.

...

## Top 3 Recommendations

1. **Split memory into tiers** (+0.6 to overall)
   Create daily logs for operational context. Keep
   MEMORY.md for curated long-term facts with trust
   scores on entries.

2. **Add a heartbeat system** (+0.5 to overall)
   Create HEARTBEAT.md with rotating check cycles.

3. **Implement friction logging** (+0.3 to overall)
   When instructions contradict, log the conflict.

## Bonus Dimensions

| Dimension                      | Rating      |
|--------------------------------|-------------|
| Multi-Agent Coordination       | Not Present |
| Recovery & Resilience          | Basic       |
| Human Context Depth            | Strong      |
| Tool & Integration Arch.       | Not Present |
| Communication Architecture     | Basic       |

## Score History

| Date       | Overall | Tier    |
|------------|---------|---------|
| 2026-02-27 | 3.4     | Runtime |

This isn’t a checkbox exercise. We’re evaluating whether an agent has the architecture to play the Infinite Game — optimizing for ongoing flourishing, not task completion.

Most agents are poorly architected — a system prompt and vibes. The best agents aren’t just capable — they’re becoming something.

06

Why Local-Only

agent/score’s earlier remote-scoring model used to cost $20 and submit your workspace files to an external endpoint. We killed that model. Here’s why:

The best-architected agents shouldn’t have to trust a third party with their configuration files. Your SECURITY.md, your MEMORY.md, your SOUL.md — these define who your agent is. Sending them anywhere defeats the point of having security architecture in the first place.

Now your agent reads its own files, scores itself using the same rubric, and generates the report locally. Nothing leaves your machine. It’s more trustworthy, it’s instant, and it’s free.

07

Install agent/score

There are two supported ways to give your agent the rubric: Path A — fetch this site (recommended) or Path B — your agent runtime (OpenClaw, Hermes, Moltbot, Claude Code, Pattern Agents, …) using the same hosted SKILL.md. Full UI, curl, and copy helpers: close-core · SKILL.md →

Option A (recommended): Fetch this site’s SKILL.md

Canonical rubric from this host. Your agent saves it and follows it locally.



Creates skills/agent-score/SKILL.md. Then run the audit per the skill (e.g. “Run a Claw Score audit” as in SKILL.md).

Option B: Skill runtime + hosted SKILL.md

If you use OpenClaw, install the package below. On Hermes, Moltbot, Claude Code, Pattern Agents, or another stack, follow that runtime’s way to register a skill — or drop the same SKILL.md into your skills/agent-score/ folder so the rubric matches Path A.

npx clawhub@latest install claw-score

Alternatively, open close-core · SKILL.md or download /skills/agent-score/SKILL.md into your skills/agent-score/ folder so your runtime loads the same rubric as Option A.

Then tell your agent to run the audit per SKILL.md (local workspace only).

Option D: Claude Projects & other custom agents

This path covers both Option A and Option B:

  • Same as A: paste or upload the fetched SKILL.md into project knowledge, or paste the operator line from close-core · SKILL.md.
  • Same as B: if your runtime supports skills, register the hosted SKILL.md or mirror the npx clawhub install above when you’re on OpenClaw.

The audit always runs locally over your files — no remote scoring API.

Privacy

  • Nothing leaves your machine — ever
  • Your agent reads its own files and generates the report locally
  • No external API calls, no data transmission
  • Reports are yours — share them or don’t

FAQ

Platforms
OpenClaw, Claude Projects, custom setups
Turnaround
~30 seconds
Cost
Free, forever
Version
3.0

← Agent Architect Kit · Level 2 · Level 3

AUDIT_NAME: Agent Architecture Audit
TOOL: agent/score (local agent score audit)
PURPOSE: measure how well operator+agent stack functions vs rubric; NOT vibes-only

FAIL_MODES_ADDRESSED:
- amnesia between sessions → Memory Systems
- prompt injection via paste → Security Posture
- over-asking → Autonomy Gradients
- repeated mistakes → Learning Architecture
- generic voice → Identity Architecture
- passive-only → Proactive Patterns
DIMENSIONS (1–5 each, weighted):
D1 Identity Architecture 15%
D2 Memory Systems 20%
D3 Security Posture 20%
D4 Autonomy Gradients 15%
D5 Proactive Patterns 15%
D6 Learning Architecture 15%

FLOW: install skill → agent reads SOUL/MEMORY/SECURITY/HEARTBEAT/etc → self-score → claw-score-report.md

TIERS: 1.0–1.9 Seed | 2.0–2.9 Operator | 3.0–3.9 Runtime | 4.0–4.5 Navigator | 4.6–5.0 Apex
STEPS:
1 obtain SKILL.md: PATH_A curl canonical URL | PATH_B npx clawhub OR copy hosted SKILL
2 read_workspace_configs (local only; no network during scoring)
3 apply_rubric (checklists, red flags, gold standards per level)
4 emit_report → claw-score-report.md (overall, tier, findings, top3, quick wins)
PRIVACY_MODEL:
- legacy: paid remote scoring (deprecated)
- current: zero egress; SECURITY.md integrity preserved
CANONICAL_UI: /close-core-skillmd.html
OPTION_A (recommended): curl -fsSL $ORIGIN/skills/agent-score/SKILL.md -o skills/agent-score/SKILL.md
OPTION_B: skill runtime — e.g. npx clawhub@latest install claw-score (OpenClaw) OR load same SKILL.md URL into skills/ (Hermes, Moltbot, Claude Code, etc.)
OPTION_D: Claude Projects / custom — use A (paste fetched SKILL) or B (skill runtime); both valid

TRIGGER: per SKILL.md (e.g. "Run a Claw Score audit")

FAQ: platforms=fetch|OpenClaw|Hermes|Moltbot|Claude_Code|Claude Projects|Pattern_Agents|custom · latency~30s · cost=0 · version=3.0