agents-md

Curated AGENTS.md preset that kills sycophancy, blocks drive-by refactors, and forces verification loops. Synthesizes Karpathy's principles with Cherny's Claude Code workflow.

repo ↗

claude-codecodexgemini-cliagents-mdprompt-engineering

Most AGENTS.md files in the wild are scratch pads - a list of build commands, a couple of “don’t break this” notes, and whatever the human thought was important last Tuesday. This one is a deliberate template, kept to roughly 200 lines on purpose so the rules actually stay resident in the agent’s context window across a long session.

The author cites two influences explicitly: Andrej Karpathy’s four principles on LLM coding failure modes, and Boris Cherny’s public Claude Code workflow that emphasises reactive pruning and tight files. Anthropic’s official Claude Code best practices and the author’s own “IJFW” principle (“it just f*cking works”) round out the credits. None of that matters if the rules don’t change agent behaviour, but they do.

What it actually changes

Nine behavioural shifts, in order of how often you’ll feel them:

Pushback over compliance. Replaces the reflexive “You’re absolutely right!” pattern with the agent flagging when the request itself is wrong.
Minimal diffs. Every changed line traces back to the request. No drive-by reformatting, no “while I was here” cleanups.
Verification first. The agent writes the test, runs it, and reports the actual result - not “this should work.”
Surface ambiguity. Stop the work and ask, instead of silently guessing the spec.
Tight file size. The template is intentionally short so the rules don’t get evicted as the conversation grows.

The drive-by-refactor rule is the one most teams fight over and the one most worth keeping. It’s why agents that follow this template produce reviewable PRs instead of 4,000-line “improvements.”

Quick start

curl -o AGENTS.md https://raw.githubusercontent.com/TheRealSeanDonahoe/agents-md/main/AGENTS.md

Drop it at the project root. If you also use Claude Code or Gemini CLI, symlink so all three agents read the same source of truth:

ln -s AGENTS.md CLAUDE.md
ln -s AGENTS.md GEMINI.md

The one bit of manual work: Section 10 (Project Context) wants your stack, build commands, and directory layout filled in. Budget five minutes. Skipping this section is the most common reason the template “doesn’t work” - without project context the rules are advisory at best.

Which agents read it natively

Native AGENTS.md readers - Codex, Cursor, Windsurf, Copilot, Aider, Devin, Amp, opencode, RooCode
Reads via symlink - Claude Code (via CLAUDE.md), Gemini CLI (via GEMINI.md)

For Claude Code specifically, the template also expects topic-scoped rules in .claude/rules/*.md with path frontmatter. Cursor’s equivalent is .cursor/rules/*.mdc. Both are optional but recommended for repo-specific overrides that shouldn’t bloat the root file.

The Section 11 trick

Section 11 (“Project Learnings”) starts empty. The agent is instructed to append to it whenever the human corrects something - a one-line entry per correction. Over a few weeks this becomes the highest-signal part of the file: it’s the actual list of things this agent gets wrong on this codebase.

Most teams that try this for the first time are surprised by the same thing: the corrections cluster. You learn that the model keeps making the same three mistakes, which is a cue to either generalise the rule or accept the limitation.

When to use it

New project, no existing AGENTS.md or CLAUDE.md. This is the highest-value moment to install it.
Existing project with a CLAUDE.md that’s grown to 800 lines and stopped working. Replace it.
Multi-agent team (Codex + Claude + Gemini) that needs one source of truth.

When not to

Heavily regulated codebases where every rule needs to be project-specific. The template is a starting point - you’ll spend more time stripping out generic rules than adding yours.
Throwaway prototypes. The setup cost isn’t worth it if the project lives a week.

The real test

Run the same task on the same codebase with and without the template. The win shows up in three places: PR diff size goes down, tests-before-claim becomes the default, and the agent stops agreeing with you when you’re wrong. That last one is the hardest to install with anything other than explicit rules - which is the whole reason this file exists.