pentest-ai-agents

Specialized Claude Code subagents that turn the CLI into a pentest assistant: plan engagements, analyze recon, research exploits, build detections, audit STIGs, and write reports.

repo ↗

claude-codesecuritypentestctfai-agent

This is a research harness, not a “jailbreak the agent” kit, and the README is unusually clear about that up front. The toolkit ships 31 Claude Code subagents that each carry deep methodology in one offensive-security domain - recon, web, Active Directory, cloud, mobile, wireless, payload crafting, reverse engineering, exploit chaining, detection, forensics, and reporting - and Claude Code routes you to the right one based on plain-English intent.

The install footprint is deliberately minimal: no servers, no Python deps, no MCP. Just markdown agent files copied to ~/.claude/agents/. The complexity lives in the prompt designs, not in infrastructure.

Authorized-use scope, stated plainly

The legal section is short and load-bearing:

“This toolkit is for authorized security testing only. Users must have proper written authorization before using these agents in any engagement.”

Prerequisites listed on the same page: Claude Code installed, a Pro or Max subscription, and - for any actual testing - signed rules of engagement with defined scope. The Tier 2 execution model (more on that below) operationalises this: agents validate every target against your declared scope before composing a command.

Quick start

curl -fsSL https://raw.githubusercontent.com/0xSteph/pentest-ai-agents/main/install.sh | bash

The script clones to a temp dir, copies agents to ~/.claude/agents/, and exits. Idempotent - re-run it for updates.

Other install modes:

./install.sh --project        # local to current project
./install.sh --global --lite  # advisory agents run on Haiku (cheaper)
./install.sh --tools          # also install the underlying CLI tools (apt/brew/pacman + pipx/go/cargo)

Once installed, Claude Code routes by intent. From the README’s own example flow:

"Plan an internal pentest for a 500-endpoint AD environment, 2-week window."
"I have a domain user, where do I look first in BloodHound?"
"Run a phishing simulation against acme-corp.com, set up GoPhish + Evilginx infrastructure."

The slash commands are the other half of the interface. /recommend "phish a small SaaS team's IT" returns the right agent plus concrete next commands. /agents-for web lists every agent relevant to web testing.

Tier 1 vs Tier 2 - the safety model

This is the design choice that distinguishes pentest-ai-agents from “give the agent root and pray.”

Tier 1 (all 31 agents): Advisory only. You paste tool output, the agent analyses, recommends next commands, you run the tools yourself.
Tier 2 (a curated subset): Compose AND execute. You declare authorized scope, the agent validates every target against it, Claude Code shows each command for approval before running.

Tier 2 covers Recon Advisor, Vuln Scanner, Web Hunter, AD Attacker, Exploit Chainer, PoC Validator, and Business Logic Hunter. Everything else stays advisory by design.

What the agents drive

The Coverage matrix in the README is the part to bookmark. A few highlights:

Recon and OSINT - nmap, masscan, rustscan, dig, whois, subfinder, amass, httpx, theHarvester, sherlock, holehe
Web app testing - ffuf, gobuster, feroxbuster, sqlmap, dalfox, Commix, dirsearch, whatweb
Active Directory - BloodHound, Impacket, NetExec/CrackMapExec, Certipy, kerbrute, Responder
Credentials - Hydra, Hashcat, John, cupp, CeWL, Mentalist, Crunch, hashid, haiti
Cloud - aws/az/gcloud CLIs, Trivy, Prowler, ScoutSuite, Pacu
Mobile - Frida, Objection, jadx, apktool, MobSF
Wireless - aircrack-ng, hcxdumptool, bettercap, wifite
Payload crafting - msfvenom, Donut, MSFvenom Payload Creator. Every payload is paired with YARA/Sigma detection content - detection-by-design is a feature, not an afterthought.
Reverse engineering - Ghidra, Radare2, JadX, Binwalk, IDA, dnSpy
Forensics - Volatility 3, exiftool, foremost, YARA, Wireshark, Autopsy

bash db/doctor.sh audits which of those are actually present on your box, grouped by agent, with ✔/✘ and install hints. Pass --agent ad-attacker to scope the check; pass --json for scripted use.

Engagement state lives in SQLite

Findings persist across Claude Code sessions via a tiny SQLite-backed CLI:

findings.sh init acme-2024 --client "ACME Corp" --type internal --scope "10.0.0.0/24"
findings.sh stats
findings.sh list vulns
findings.sh export        # full JSON
bash handoff.sh           # Markdown handoff report for next session

Tier 2 agents write to the DB automatically when findings.sh is in PATH. The handoff report is the underrated piece - it’s the artifact you’d actually want when picking up an engagement on day 8 of 14.

Local models, if you need them

The agent files are plain markdown system prompts; only the YAML frontmatter is Claude-specific. ./opencode-setup.sh --full converts every agent into OpenCode commands that work with Ollama or LM Studio. The methodology survives the model swap; the model quality is on you.

When to reach for it

Authorized engagement work where you want methodology consistency across operators.
Training and lab environments - HackTheBox, TryHackMe, internal red team exercises.
Detection engineering - the Sigma / SPL / KQL output from detection-engineer is genuinely useful.

When not to

Anywhere the engagement isn’t authorized. The README says it; this writeup repeats it.
Production-sensitive targets without Tier 2 scope validation in place.
If you wanted full autonomous exploitation. The companion MCP server (pentest-ai) goes further toward auto-pipelines; this repo deliberately stops short.