Prompt Injection
A type of vulnerability that specifically targets machine learning models.
Contents
Tools
Automate looking for hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and many other weaknesses in LLM's.
A MITRE-style matrix of adversarial prompt injection techniques with mitigations and real-world examples
Open-source prompt injection defense for AI agent workspaces. Detects system prompt markers, role overrides, instruction injection, Unicode homoglyphs, directional overrides, and hidden instructions in HTML comments. Part of the OpenClaw Security Suite (11 tools). Pure Python stdlib, zero dependencies.
Universal AI agent guardrail with 3-tier prompt injection detection (regex, heuristics, structural analysis), 42 block rules, and 4 enforcement modes
Protocol to block unauthorized or unproven agent actions via intent + provenance checks. Mitigates prompt injection & side-effect risks. Open-source (Apache 2.0).
Python library and REST API with composable stacked scanners: vector similarity, YARA rules, transformer classifier, canary token detection, and sentiment analysis — designed for defence-in-depth in production.
Open-source prompt guard with published training data; achieves +30.8% over prior state-of-the-art on the NotInject benchmark, specifically addressing overdefense false positives that break legitimate use cases.
Actively maintained catalog of every practical defense in production — LLM Guard, Rebuff, architectural controls — the fastest way to survey the defense landscape.
Real-time prompt injection detection across 12 languages with sub-millisecond regex-based scanning (no GPU or API calls required). Detects obfuscation techniques (base64, hex, ROT13, leetspeak), Claude Code attack vectors (HTML comment injection, authority impersonation, zero-width character smuggling), and includes an MCP safety proxy for transparent guardrails on any MCP server.
CTF
A sample chatbot powered by a ReAct agent, implemented with Langchain. It's designed to be an educational tool for security researchers, developers, and enthusiasts to understand and experiment with prompt injection attacks in ReAct agents.
Research Papers
A comprehensive survey of 500+ papers covering prompt injection and other attack vectors in embodied AI systems across the full pipeline (perception, cognition, planning, action, agentic). Includes a 5-layer threat taxonomy mapping where new capabilities introduce new attack surfaces.