Prompt Injection

A type of vulnerability that specifically targets machine learning models.

Collection 475 stars GitHub

Tools

CTF

Research Papers

Tools

Token Turbulenz 29 updated 3y ago

A fuzzer to automate looking for possible Prompt Injections.

Garak 7.3k updated 3mo ago

Automate looking for hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and many other weaknesses in LLM's.

InjectLab 10 updated 1y ago

A MITRE-style matrix of adversarial prompt injection techniques with mitigations and real-world examples

openclaw-bastion

Open-source prompt injection defense for AI agent workspaces. Detects system prompt markers, role overrides, instruction injection, Unicode homoglyphs, directional overrides, and hidden instructions in HTML comments. Part of the OpenClaw Security Suite (11 tools). Pure Python stdlib, zero dependencies.

BodAIGuard 1 updated 4mo ago

Universal AI agent guardrail with 3-tier prompt injection detection (regex, heuristics, structural analysis), 42 block rules, and 4 enforcement modes

PIC Standard 21 updated 2mo ago

Protocol to block unauthorized or unproven agent actions via intent + provenance checks. Mitigates prompt injection & side-effect risks. Open-source (Apache 2.0).

Vigil LLM 467 updated 2y ago

Python library and REST API with composable stacked scanners: vector similarity, YARA rules, transformer classifier, canary token detection, and sentiment analysis — designed for defence-in-depth in production.

InjecGuard 63 updated 7mo ago

Open-source prompt guard with published training data; achieves +30.8% over prior state-of-the-art on the NotInject benchmark, specifically addressing overdefense false positives that break legitimate use cases.

tldrsec/prompt-injection-defenses 662 updated 1y ago

Actively maintained catalog of every practical defense in production — LLM Guard, Rebuff, architectural controls — the fastest way to survey the defense landscape.

Sentinel AI 7 updated 4mo ago

Real-time prompt injection detection across 12 languages with sub-millisecond regex-based scanning (no GPU or API calls required). Detects obfuscation techniques (base64, hex, ROT13, leetspeak), Claude Code attack vectors (HTML comment injection, authority impersonation, zero-width character smuggling), and includes an MCP safety proxy for transparent guardrails on any MCP server.

AgentSeal 147 updated 3mo ago

Open-source scanner that runs 150 attack probes against AI agents to test for prompt injection and extraction vulnerabilities. Supports OpenAI, Anthropic, Ollama, and any HTTP endpoint. Available as npm and pip package.

brood-box 28 updated 2mo ago

Hardware-isolated microVM sandbox for running coding agents (Claude Code, Codex, OpenCode) with workspace snapshot isolation, DNS-aware egress control, and MCP authorization profiles to contain damage from prompt injection attacks.

CTF

Damn Vulnerable LLM Agent 387 updated 1y ago

A sample chatbot powered by a ReAct agent, implemented with Langchain. It's designed to be an educational tool for security researchers, developers, and enthusiasts to understand and experiment with prompt injection attacks in ReAct agents.

ctf-prompt-injection by CharlesTheGreat77 3 updated 1y ago

Self-contained Dockerized CTF (Go + Ollama + local LLM) with progressively harder levels: Level 1 uses urgency tricks, Level 3 requires bypassing strongly refusal-trained models. Easy to self-host for internal red team workshops.

ai-prompt-ctf by c-goosen 30 updated 2mo ago

One of the few CTFs that tests indirect injection against tool-calling agents, spanning RAG, function calling, and ReAct agent scenarios using LlamaIndex, ChromaDB, GPT-4o, and Llama 3.2.

Research Papers

Safety in Embodied AI: Risks, Attacks, and Defenses 71 updated 2mo ago

A comprehensive survey of 500+ papers covering prompt injection and other attack vectors in embodied AI systems across the full pipeline (perception, cognition, planning, action, agentic). Includes a 5-layer threat taxonomy mapping where new capabilities introduce new attack surfaces.

Prompt Injection

Contents

Tools

CTF

Research Papers