Beyond the "AI Pentest": Why Risk Assessment Is the Real End Game¶
In the consulting world, clients love the word "Pentest." It's familiar. It's a known line item. They know what the report will look like, how long it takes, and that it satisfies a "Security Testing" checkbox for auditors.
But when a client asks for an "AI Pentest" or "LLM Red Teaming," they are often asking for the wrong thing.
Unlike traditional software—where a SQL Injection vulnerability is a deterministic flaw that must be fixed—AI vulnerabilities are probabilistic, inherent, and often not patchable. You can always trick a model if you try hard enough. Spending 40 hours proving that a model occasionally makes mistakes is a costly dead end.
As security professionals, our job is to pivot the conversation from: "Can you break this?" → "Does it matter if this breaks?"
This guide outlines an AI Risk Assessment methodology that delivers far more value than a traditional pentest because it focuses on Threat Modeling, Governance, Compliance, and Context.
The "Pentest" Trap¶
In a traditional network penetration test, success means finding a path to Domain Admin. The remediation is clear: patch a server, close a port, change a configuration.
In AI, many of the so-called "vulnerabilities" are simply the model behaving as mathematically expected—generalizing based on probabilistic training.
The Trap: A pentester spends a week crafting elaborate jailbreak prompts that make a customer support bot swear.
The Report: "High Severity: Model can generate profanity."
The Client: "We can't patch the model without retraining it for $50k, so we'll accept the risk."
The engagement ends without improving the client's actual security posture.
The missing link is Context.
Phase 1: The Threat Modeling Workshop (MAP)¶
Before writing a single prompt or running a single tool, the engagement must begin with a Threat Modeling workshop using industry-standard frameworks.
This is where most AI pentests fail: they start at the bottom of the pyramid (jailbreak attempts) instead of at the top (governance and architecture).
Frameworks: Selecting the Right Baseline¶
Clients often drown in the alphabet soup of NIST, ISO, OWASP, and MITRE. Your job is to pick the framework that matches their maturity and regulatory exposure.
NIST AI RMF – The Governance Foundation¶
Use this if the organization is early in its AI adoption. It provides structure for: - Governance - Lifecycle management - Risk identification - Continuous monitoring
It's flexible, not overly prescriptive, and aligns well with future US regulatory direction.
ISO/IEC 42001 – The Compliance & Certification Path¶
Choose this if the organization: - Operates internationally - Expects third-party audits - Needs a certifiable AI management system (AIMS)
ISO 42001 is stricter than NIST but interoperates well with it.
MITRE ATLAS / FRAME – The Technical Threat Model¶
Use these when the client: - Deploys AI in adversarial contexts - Exposes APIs or agents publicly - Handles sensitive data - Builds ML-based products
ATLAS provides concrete adversarial tactics (e.g., poisoning, evasion). FRAME helps structure failure mode analysis.
MAESTRO – For Agentic AI Architectures (Optional)¶
If the client is experimenting with multi-agent or tool-executing systems, MAESTRO provides agent-specific risk modeling patterns.
Our Process: STRIDE + ML Failure Modes + MITRE ATLAS¶
We merge classical threat modeling (STRIDE) with ML-specific failure modes and map them directly to ATLAS tactics.
This immediately clarifies: - What assets matter - What trust boundaries exist - What realistic attack paths are possible
Step 1: Define the Assets (AI ≠ Just a Model)¶
In AI systems, the model is often not the crown jewel. The infrastructure around it is.
Cloud Environment¶
AWS/Azure/GCP accounts, IAM roles, VPCs, and orchestration. A cloud compromise bypasses all ML controls.
MLOps Infrastructure¶
The "model factory": - S3 buckets and data lakes - Training pipelines - CI/CD (Jenkins/GitHub Actions) - Container registries - Kubernetes/Kubeflow
Compromise here = supply chain compromise.
Model Weights¶
Target: IP theft, model extraction.
Training Data¶
Target: membership inference, PII leakage.
Inference Pipeline¶
Target: availability attacks (e.g., sponge attacks).
Step 2: The Hidden Gateway — Cloud Security > Fancy ML Attacks¶
Most clients fear exotic mathematical attacks like gradient-based poisoning, which require uncommon access and deep expertise.
Meanwhile, their cloud security looks like Swiss cheese.
AI models don't live in a vacuum. They sit in S3 buckets, run inside containerized runtimes, and pull data from pipelines.
Reality: It is extremely hard to poison model weights during training in a well-secured pipeline.
Threat: It is extremely easy to compromise a misconfigured IAM role, open S3 bucket, or unsecured MLflow instance.
Common high-impact findings: - IAM privilege escalation (SageMaker roles with overly broad permissions) - Exposed MLOps services (public MLflow/Uvicorn endpoints without auth) - Training vs inference trust boundary violations (write permissions crossing boundaries)
Most "AI vulnerabilities" are actually cloud vulnerabilities wearing an AI costume.
Example Attack Chain (100% Cloud-Based)¶
Step 1 — Initial Access: Leaked SageMaker Studio Presigned URL
A developer shares a notebook URL. It leaks somewhere. Attacker clicks → full Jupyter access.
Step 2 — Credential Access: IAM Role Theft
Attacker queries IMDS:
Step 3 — Lateral Movement: S3 Recon
Role has AmazonSageMakerFullAccess. Attacker lists buckets → finds model artifacts.
Step 4 — Poisoning: Malicious Model Swap
Attacker modifies model.tar.gz → reuploads → triggers automated deployment.
Step 5 — Execution
Fraud model now approves any loan where name="TestUser".
This attack requires zero ML expertise. It is 100% a cloud misconfiguration.
Step 3: Apply STRIDE (AI-Adaptive)¶
We walk the client through practical scenarios: - Spoofing: Printed photos bypass biometric sensors. - Tampering: Public data source replaced → poisoned training set. - Repudiation: No explainability or logs for AI agent decisions. - Information Disclosure: Membership inference exposes PII. - Denial of Service: Sponge input causes runaway compute costs. - Elevation of Privilege: Prompt injection manipulates system instructions.
Deliverable: A Threat Model Diagram with data flows, trust boundaries, and ATLAS tactic labels.
Phase 2: Adversarial Validation (MEASURE)¶
This is the Red Teaming portion—but it's contextual, not generic.
Different modalities require different testing:
LLMs (Chatbots, Agents)¶
- Automated jailbreaking (PyRIT, Garak)
- Indirect prompt injection
- System prompt leakage
- Tool misuse validation
Computer Vision¶
- Digital adversarial noise (FGSM/PGD)
- Physical patches
- Evasion tests against stop-sign or biometric classifiers
Predictive Models¶
- Feature perturbation attacks
- Boundary analysis
- Fairness stress tests (optional)
The goal is always the same: Validate the actual risks identified in Phase 1—not hunt random model quirks.
Phase 3: The "Living" Artifacts (MANAGE)¶
A PDF pentest report becomes outdated the moment the model is retrained.
A Risk Assessment produces artifacts that evolve over time.
1. AI Risk Register¶
Maps technical findings → business impact → controls.
Example:
| Field | Value |
|---|---|
| Risk ID | AI-R-04 |
| Threat | Model Inversion |
| Impact | GDPR violation, litigation |
| Current Control | Rate limiting |
| Recommended Control | Output filtering or DP |
2. Traceability Matrix¶
Required in regulated industries. Maps Regulation → Threat → Mitigation → Validation Test → Status
Example:
| Regulation | Threat | Mitigation | Validation | Status |
|---|---|---|---|---|
| EU AI Act Art. 15 | Evasion | Adversarial Training | ART 4.2 | Pass |
| GDPR Art. 35 | Data Leakage | PII Filtering | Garak PII Probe | Fail |
3. Living Validation Document¶
When the model is retrained, validation scripts are rerun. Updates feed back into the Risk Register. This satisfies NIST AI RMF's Continuous Monitoring expectations.
Regulatory Drivers (The "Why Now?")¶
EU AI Act¶
High-Risk AI must demonstrate robustness and resilience against adversarial manipulation.
A pentest provides a snapshot. A Risk Assessment provides a technical file, which is what regulators will ask for.
NIST AI RMF¶
Increasingly used for liability and due diligence in the US. - MAP → Threat Modeling - MEASURE → Adversarial Validation - MANAGE → Governance artifacts
This triad is quickly becoming the expected baseline.
Conclusion¶
As security researchers and consultants, we need to move away from "gotcha" hacking and toward Resilience Engineering.
A pentest tells you what broke today. A Risk Assessment tells you what matters, why it matters, and how to operate AI safely over time.
By steering clients away from one-off model poking and toward structured governance and risk management, we build systems that are secure, compliant, and capable of surviving the real world—not just surviving 40 hours of jailbreak attempts.