pentest-ai-agents
Specialized Claude Code subagents that turn the CLI into a pentest assistant: plan engagements, analyze recon, research exploits, build detections, audit STIGs, and write reports.
This is a research harness, not a “jailbreak the agent” kit, and the README is unusually clear about that up front. The toolkit ships 31 Claude Code subagents that each carry deep methodology in one offensive-security domain - recon, web, Active Directory, cloud, mobile, wireless, payload crafting, reverse engineering, exploit chaining, detection, forensics, and reporting - and Claude Code routes you to the right one based on plain-English intent.
The install footprint is deliberately minimal: no servers, no Python deps, no MCP. Just markdown agent files copied to ~/.claude/agents/. The complexity lives in the prompt designs, not in infrastructure.
Authorized-use scope, stated plainly
The legal section is short and load-bearing:
“This toolkit is for authorized security testing only. Users must have proper written authorization before using these agents in any engagement.”
Prerequisites listed on the same page: Claude Code installed, a Pro or Max subscription, and - for any actual testing - signed rules of engagement with defined scope. The Tier 2 execution model (more on that below) operationalises this: agents validate every target against your declared scope before composing a command.
Quick start
curl -fsSL https://raw.githubusercontent.com/0xSteph/pentest-ai-agents/main/install.sh | bash
The script clones to a temp dir, copies agents to ~/.claude/agents/, and exits. Idempotent - re-run it for updates.
Other install modes:
./install.sh --project # local to current project
./install.sh --global --lite # advisory agents run on Haiku (cheaper)
./install.sh --tools # also install the underlying CLI tools (apt/brew/pacman + pipx/go/cargo)
Once installed, Claude Code routes by intent. From the README’s own example flow:
"Plan an internal pentest for a 500-endpoint AD environment, 2-week window."
"I have a domain user, where do I look first in BloodHound?"
"Run a phishing simulation against acme-corp.com, set up GoPhish + Evilginx infrastructure."
The slash commands are the other half of the interface. /recommend "phish a small SaaS team's IT" returns the right agent plus concrete next commands. /agents-for web lists every agent relevant to web testing.
Tier 1 vs Tier 2 - the safety model
This is the design choice that distinguishes pentest-ai-agents from “give the agent root and pray.”
- Tier 1 (all 31 agents): Advisory only. You paste tool output, the agent analyses, recommends next commands, you run the tools yourself.
- Tier 2 (a curated subset): Compose AND execute. You declare authorized scope, the agent validates every target against it, Claude Code shows each command for approval before running.
Tier 2 covers Recon Advisor, Vuln Scanner, Web Hunter, AD Attacker, Exploit Chainer, PoC Validator, and Business Logic Hunter. Everything else stays advisory by design.
What the agents drive
The Coverage matrix in the README is the part to bookmark. A few highlights:
- Recon and OSINT - nmap, masscan, rustscan, dig, whois, subfinder, amass, httpx, theHarvester, sherlock, holehe
- Web app testing - ffuf, gobuster, feroxbuster, sqlmap, dalfox, Commix, dirsearch, whatweb
- Active Directory - BloodHound, Impacket, NetExec/CrackMapExec, Certipy, kerbrute, Responder
- Credentials - Hydra, Hashcat, John, cupp, CeWL, Mentalist, Crunch, hashid, haiti
- Cloud - aws/az/gcloud CLIs, Trivy, Prowler, ScoutSuite, Pacu
- Mobile - Frida, Objection, jadx, apktool, MobSF
- Wireless - aircrack-ng, hcxdumptool, bettercap, wifite
- Payload crafting - msfvenom, Donut, MSFvenom Payload Creator. Every payload is paired with YARA/Sigma detection content - detection-by-design is a feature, not an afterthought.
- Reverse engineering - Ghidra, Radare2, JadX, Binwalk, IDA, dnSpy
- Forensics - Volatility 3, exiftool, foremost, YARA, Wireshark, Autopsy
bash db/doctor.sh audits which of those are actually present on your box, grouped by agent, with ✔/✘ and install hints. Pass --agent ad-attacker to scope the check; pass --json for scripted use.
Engagement state lives in SQLite
Findings persist across Claude Code sessions via a tiny SQLite-backed CLI:
findings.sh init acme-2024 --client "ACME Corp" --type internal --scope "10.0.0.0/24"
findings.sh stats
findings.sh list vulns
findings.sh export # full JSON
bash handoff.sh # Markdown handoff report for next session
Tier 2 agents write to the DB automatically when findings.sh is in PATH. The handoff report is the underrated piece - it’s the artifact you’d actually want when picking up an engagement on day 8 of 14.
Local models, if you need them
The agent files are plain markdown system prompts; only the YAML frontmatter is Claude-specific. ./opencode-setup.sh --full converts every agent into OpenCode commands that work with Ollama or LM Studio. The methodology survives the model swap; the model quality is on you.
When to reach for it
- Authorized engagement work where you want methodology consistency across operators.
- Training and lab environments - HackTheBox, TryHackMe, internal red team exercises.
- Detection engineering - the Sigma / SPL / KQL output from
detection-engineeris genuinely useful.
When not to
- Anywhere the engagement isn’t authorized. The README says it; this writeup repeats it.
- Production-sensitive targets without Tier 2 scope validation in place.
- If you wanted full autonomous exploitation. The companion MCP server (
pentest-ai) goes further toward auto-pipelines; this repo deliberately stops short.
Similar tools
- cc-telegram-bridge
Multi-bot, multi-engine Telegram bridge with per-bot personality, budget caps, streaming, session resume, and an Agent Bus for parallel pipelines.
- Claude Code Analysis
82 docs and 15 diagrams mapping every major subsystem of Claude Code's accidentally exposed 512K-line TypeScript source - YOLO classifier, 93% context compaction, prompt-cache layout, 88+ feature flags, the custom React-Fiber terminal renderer.
- Claudraband
Wraps the real Claude Code TUI with a session lifecycle layer. Resumable non-interactive workflows, HTTP daemon for remote/headless control, ACP server for editor integrations (Zed, Toad). Drives your existing Claude Code install rather than reimplementing it - keeps skills, hooks, MCPs, and approvals intact.
- Garden Skills
Three carefully-scoped skills: web-design-engineer (with an anti-cliche blocklist that breaks the generic-AI-landing-page loop), gpt-image-2 (80+ templates, three runtime modes including advisor-only fallback), and kb-retriever (layered data_structure.md navigation for bounded local-KB retrieval). Tested across Claude Code, Claude.ai, Cursor, Codex, Gemini, OpenCode.