Prompt Injection
Direct and indirect prompt injection: hidden instructions in documents, URLs, emails or tool outputs that coerce the model into bypassing its system prompt, leaking data or invoking unauthorised tools.
Prompt injection. Jailbreaks. Tool abuse. RAG-source poisoning. Agent-driven data exfiltration. Model theft. The AI security attack surface is new, expanding weekly, and it does not map cleanly to your old web-app pentest playbook. Our research-led AI red team tests production GenAI systems against the OWASP Top 10 for LLMs, MITRE ATLAS and emerging agent-abuse techniques, with findings, proof-of-concept exploits and remediation guidance your engineering team can act on.
Free Scan your AI app's web surface with Cactus before we test the modelOrganisations are shipping GenAI features faster than they can secure them. Here's what the research shows, and what we find when we look.
A modern LLM application rarely talks to just a human. It reads documents, invokes tools, queries vector stores, calls APIs and makes autonomous decisions. Every one of those hops is an untrusted boundary, and attackers know it. Indirect prompt injection (the #1 emerging LLM threat) turns a malicious PDF, email or web page into a silent instruction to your model.
We test against the full OWASP Top 10 for Large Language Model Applications, plus emerging agent-abuse techniques not yet in the public catalogue.
Direct and indirect prompt injection: hidden instructions in documents, URLs, emails or tool outputs that coerce the model into bypassing its system prompt, leaking data or invoking unauthorised tools.
LLM output treated as trusted input downstream, leading to XSS, SSRF, SQL injection and SSTI when model responses are rendered or executed without sanitisation.
Adversarial inputs to training or fine-tuning pipelines, creating backdoors, bias injection or targeted misclassification in the deployed model.
Crafted prompts that consume excessive compute, trigger recursion, exhaust context window or drive infrastructure cost: denial-of-wallet and denial-of-service attacks.
Untrusted model weights, pickle-deserialisation RCE in .bin/.safetensors files, vulnerable dependencies, poisoned HuggingFace models and compromised fine-tuning datasets.
Extraction of system prompts, training data, PII, secrets, API keys, proprietary code and confidential business data from model outputs via inference-time attacks.
Plugins and tools without input validation, authorisation or scope, allowing chained attacks that pivot from a single prompt-injection into full environment compromise.
Over-permissioned agents with destructive capabilities (delete, transfer, spend, post-to-customer) invoked without human-in-the-loop or scope enforcement.
Business processes relying on LLM output as authoritative fact: hallucination, factual drift and prompt-injected answers treated as ground truth in downstream decisions.
Extraction of proprietary model weights, fine-tuning data, system prompts and behavioural fingerprints via inference APIs and statistical reconstruction attacks.
Multi-turn agent-loop manipulation, tool-output poisoning, memory injection and MCP / function-calling abuse: the next wave of LLM attacks, tested against live agent deployments.
Prompt injection embedded in images, audio and video: invisible to humans, instructive to the model. Covered for vision, voice and multimodal GenAI applications.
AI security is not a "we added it to the pentest menu" service for us. It's a dedicated research practice.
Team members have disclosed prompt-injection, agent-hijacking and RAG-poisoning CVEs against production GenAI platforms, with research published at major offensive security conferences.
Active contributors to the OWASP Top 10 for LLM Applications community and MITRE ATLAS framework. We help define the testing standards the industry relies on.
Our AI red team comes from classical offensive security (OSCP, OSWE, CREST CRT, eWPTX, CEH, PNPT) with years of web-app, API, cloud and binary pentesting before pivoting into LLM security.
We've built and open-source-contributed to the LLM red-team tooling ecosystem: Garak extensions, PyRIT converters, a custom indirect-injection harness and a private MITRE ATLAS TTP library.
We've tested production LLM systems across financial services, healthcare, legal, government, e-commerce and critical infrastructure, including RAG-over-confidential-documents and autonomous agent deployments.
Every finding comes with a working proof-of-concept exploit, a specific remediation recommendation your AI engineers can implement (not "add a guardrail"), and a free retest once the fix is shipped.
Structured, repeatable and aligned with OWASP, MITRE ATLAS and NIST AI RMF, but with real exploit depth, not a checklist walkthrough.
Map the AI attack surface (model, prompts, tools, data sources, agents, integrations) and build a target-specific threat model against OWASP LLM Top 10 & ATLAS.
Identify model family, system-prompt leakage surface, guardrail posture, tool set, RAG sources and rate-limit behaviour, without triggering abuse protections.
Prompt injection, jailbreak, indirect injection via uploaded documents, tool abuse, memory poisoning and sensitive-data extraction, done by human researchers.
Custom and open-source fuzzing harnesses (Garak, PyRIT, PromptFoo, in-house) run 10k+ adversarial prompts to surface edge-case failures at scale.
Multi-step exploitation: combine LLM flaws with plugin, API, authorisation and business-logic weaknesses to demonstrate realistic impact paths.
Executive and technical report, working PoCs, engineering-grade remediation guidance, regulator-ready evidence and a free retest of every fixed finding.
From a focused chatbot test to a full multi-agent adversary simulation, scoped to your specific architecture and risk appetite.
End-to-end security testing of a GenAI-enabled product feature: chatbot, copilot, summariser, classifier or agent. Full OWASP LLM Top 10 coverage plus the underlying web, API and auth layers.
Focused testing of retrieval-augmented generation stacks (ingestion, chunking, embeddings, vector store and grounding) with emphasis on indirect prompt injection and cross-tenant leakage.
Purple-team assessment of autonomous agents, tool-using LLMs and plugin ecosystems. Focused on excessive agency, tool-output poisoning, memory manipulation and MCP/function-calling abuse.
Black-box, multi-week engagement simulating a determined external adversary targeting your AI surface. Scoped like a classical red team, with objectives, TLOs and realistic dwell time.
Security review of your model-hosting supply chain: model weights, pickle files, HuggingFace dependencies, fine-tuning datasets, MLOps pipelines and model-registry access controls.
Hands-on AI security workshops for your engineers, ML team and security architects. From LLM-attack fundamentals to running your own threat-modelling workshop on a new GenAI feature.