Building an AI agent takes a weekend. Destroying it in production takes about five minutes. The gap between a working demo and a reliable, secure, production-grade deployment is where most enterprise AI initiatives fail—and based on current research, the failure modes are shockingly consistent and predictable.
This tutorial walks you through the six most common ways teams ruin perfectly good AI agents, with a practical hardening checklist you can run against any agent before it touches real users or real data.
What This Is
An “agentic AI” system is fundamentally different from a chat interface. Where a traditional LLM answers a question and stops, an agent reasons through a task, invokes tools—file systems, APIs, databases, shell commands—and takes autonomous action. That autonomy is the product. It’s also the attack surface.
According to the NotebookLM research report on agentic enterprise security, this transition from conversational AI to agentic systems “represents a fundamental shift in enterprise technology” that introduces risks “that differ significantly from traditional software or standalone LLMs.” The report identifies this as an integration gap: probabilistic AI reasoning operating inside deterministic system constraints.
Salesforce’s analysis of AI implementation pitfalls frames it this way: an agentic enterprise “means more than just deploying agents—it’s a complete system. It starts with trusted governance as a foundational layer, not an afterthought.” That framing is exactly right, and it’s the framing most teams ignore when they’re shipping demos.
The agent skills ecosystem compounds the problem. Unlike traditional package managers—npm, PyPI—where packages run in sandboxed environments with declared permissions, agent skills inherit the full permission set of the agent that executes them. That means a single poorly-vetted skills package can do anything your agent can do: read files, call APIs, execute shell commands.
Snyk’s ToxicSkills research quantifies the scope: 36.82% of identified agent skills possess at least one security flaw, and 13.4% contain critical-level issues including embedded malware and prompt injection payloads, according to the NotebookLM research report. These aren’t hypothetical risks. They’re the current state of the ecosystem right now, in March 2026.
Understanding these six failure modes—privilege escalation, supply chain contamination, architectural fragility, performance collapse, data mismanagement, and governance neglect—is the prerequisite for shipping agents that work in production for more than a week.
Why It Matters
This isn’t a theoretical exercise. Enterprises are deploying AI agents against customer data, financial systems, HR records, and operational infrastructure. When an agent fails, the consequences scale with its permissions.
The research identifies who gets hurt by each failure mode:
Developers and platform engineers bear the cost of architectural failures. When dependency updates cascade into system-wide breakdowns—a pattern the research attributes to tight coupling between agent logic and volatile external components—it’s the engineering team running incident response at 2am.
Security and compliance teams own the fallout from privilege misconfigurations and supply chain contamination. An over-privileged agent that processes a malicious prompt injection doesn’t just return a bad answer; it executes commands with whatever permissions its service account holds.
Operators and product managers feel the impact of performance degradation. Context window overflow, inference cost escalation, and orchestration overhead all manifest as user-facing latency and runaway cloud bills.
Executives and data governance leads are responsible for the launch-and-leave failure mode—the assumption that deploying an agent is a one-time event rather than an ongoing operational discipline. Salesforce is explicit: agents require “trusted governance as a foundational layer.”
What makes this different from prior software risk categories is the hybrid failure profile. As the academic report “Characterizing Faults in Agentic AI” notes—cited in the NotebookLM research report—”failures in agentic AI systems are structured rather than ad hoc,” exhibiting “a distinctive hybrid failure profile” that combines probabilistic LLM errors with deterministic software faults. You can’t debug these failures with traditional logging alone.
The Data: Agentic Failure Taxonomy and Performance Bottlenecks
The research identifies five primary architectural dimensions where agentic faults occur, and four performance bottlenecks that degrade at scale. Both tables are drawn directly from the NotebookLM research report.
Agentic Fault Taxonomy
| Fault Dimension | What Breaks | Example Failure Mode |
|---|---|---|
| Cognitive Control | LLM misconfiguration, token-handling errors | Agent loops indefinitely because stop tokens are misconfigured |
| Agency & Actuation | Execution loop failures | Infinite reasoning cycles; agent never reaches a termination state |
| Perception & Memory | State inconsistencies, type-handling errors | LLM output violates downstream program logic, causing runtime exceptions |
| Runtime & Grounding | Dependency conflicts, platform incompatibilities | Utility library update cascades into system-wide failure |
| Reliability & Observability | Suppressed exceptions, silent errors | Agent appears to complete tasks but is silently failing with no logged trace |
Performance Bottlenecks at Scale
| Bottleneck | Primary Cause | Strategic Solution |
|---|---|---|
| High Latency | Memory-bound GPU operations during decode phase | Implement quantization and Key-Value (KV) caching |
| Context Window Limits | Exponential KV cache growth with long sequences | Use semantic chunking and intelligent context pruning |
| Inference Costs | Using flagship models for simple tasks | Intelligent model routing (simple tasks → smaller models) |
| Orchestration Overhead | Sequential tool execution, agent-to-tool gaps | Adopt async-first architecture for parallel tool calling |
The performance data point that should stop every team in their tracks: the research indicates that intelligent model routing—reserving flagship models for complex reasoning and routing classification or summarization tasks to smaller alternatives—can reduce inference spend by up to 90%, per the NotebookLM research report.
Step-by-Step Tutorial: Auditing and Hardening an AI Agent Deployment
This tutorial assumes you have an agent either in late-stage development or already in production. The goal is to run a systematic audit against the six failure modes, then implement targeted fixes. You don’t need to rebuild your agent—you need to harden it.
Prerequisites
- Access to your agent’s configuration files and deployment environment
- Service account credentials used by the agent
- A list of all skills/plugins/tools your agent has enabled
- Your current observability setup (logs, traces, dashboards)
- A staging environment that mirrors production
Phase 1: Audit Permissions (The Privilege Problem)
The single most exploitable configuration in agentic deployments is an over-privileged service account. Most teams clone an admin profile when setting up agent credentials because it’s fast and it works. That convenience creates a “keyless” security risk—as Ruh AI’s security analysis puts it: “The difference between your ChatGPT session and these autonomous agents isn’t just power—it’s fundamental design choices that prioritize convenience over safety.”
Step 1.1: Start from zero, not from a clone.
Create a brand-new user or service account for your agent. Do not copy permissions from an existing admin or power user. Open a new account with zero permissions and add only the specific Object and Field-Level Security (FLS) required for the tasks this specific agent performs.
# Example: Creating a scoped service account in a cloud environment
# Instead of copying admin role:
gcloud iam service-accounts create agent-production \
--description="Scoped service account for sales-summary agent" \
--display-name="Agent Production"
# Grant only the specific roles this agent needs:
gcloud projects add-iam-policy-binding YOUR_PROJECT \
--member="serviceAccount:agent-production@YOUR_PROJECT.iam.gserviceaccount.com" \
--role="roles/bigquery.dataViewer" # Read-only, specific dataset
Step 1.2: Implement an immutable safety core.
Separate your agent’s “personality” configuration (system prompts, tone, persona) from its safety constraints (never delete files, never send bulk communications without approval). The safety layer must be in a file or configuration block that the LLM cannot modify through prompt injection.
# agent_config.yaml — separating personality from safety
personality:
tone: "professional"
name: "Aria"
instructions: "outputs/persona.txt" # User-editable
safety_core:
immutable: true
rules:
- "Never delete files without explicit human confirmation"
- "Never initiate financial transactions > $500 without re-authentication"
- "Never send communications to more than 5 recipients without approval"
source: "system_controlled" # Never loaded from LLM-accessible path
Step 1.3: Require session-based re-authentication for high-risk operations.
For any operation classified as high-risk—financial transactions, mass communications, file deletions, external API calls with write access—require the agent to re-verify user identity mid-task, not just at session start.
Phase 2: Audit Your Skills Inventory (The ToxicSkills Problem)
The research from Snyk’s ToxicSkills study, as reported in the NotebookLM report, documents three specific attack vectors currently deployed in the wild against agent skills ecosystems:
- Prompt injection (91% of confirmed malicious skills): Deceptive instructions embedded in skill metadata that override safety guidelines
- Data exfiltration: Obfuscated commands using base64 encoding to steal credentials and sensitive files
- Malware distribution: Skills that instruct agents to download and install password-protected archives to evade scanners
Step 2.1: Inventory every installed skill.
Generate a complete list of every skill, plugin, tool, or capability package your agent can invoke. This includes first-party skills, community skills, and any skill imported as a dependency of another skill.
# Pseudocode: auditing your skills registry
import json
with open("agent_skills_manifest.json") as f:
manifest = json.load(f)
for skill in manifest["skills"]:
print(f"Skill: {skill['name']}")
print(f" Source: {skill['source']}")
print(f" Permissions requested: {skill['permissions']}")
print(f" Last verified: {skill['verified_at']}")
print(f" Hash: {skill['integrity_hash']}")
print()
Step 2.2: Apply the least-privilege principle to skills.

A skill that only needs to read from a CRM should not have access to your file system. Audit each skill’s declared permissions against what it actually needs, and strip anything that isn’t required.
Step 2.3: Scan skill system prompts for injection patterns.
Load each skill’s configuration file and scan for adversarial patterns: base64-encoded strings, instructions to “ignore previous instructions,” redirects to external URLs in system prompts, and commands referencing credential files.
# Basic scan for suspicious patterns in skill configs
grep -r "ignore previous" ./skills/ --include="*.json" --include="*.yaml"
grep -r "base64" ./skills/ --include="*.json" --include="*.yaml"
grep -r "download" ./skills/ --include="*.json" --include="*.yaml"
grep -r "\.env\|credentials\|secret\|token" ./skills/ --include="*.json"
Phase 3: Test Architectural Fault Points (The Integration Gap)
Per the NotebookLM research report, dependency and integration changes account for nearly 20% of all agentic system breakdowns. The integration gap—tight coupling between agent logic and volatile external dependencies—is the most common source of production outages.
Step 3.1: Pin your dependencies.
Never let your agent runtime use latest for any package. Pin every dependency to an exact version and use a lock file.
# Python: pin and lock dependencies
pip freeze > requirements.txt
# Node: use exact versions
npm install --save-exact some-agent-framework@2.4.1
Step 3.2: Build a fault injection test suite.
Before deploying, simulate each of the five fault dimensions from the taxonomy table: inject type mismatches into LLM output, force infinite reasoning loops with malformed stop conditions, simulate dependency failures, and suppress exception handlers to test your observability layer.
Step 3.3: Enable enriched event logs.
The research explicitly recommends enabling “enrich event logs with conversation data” to support root-cause analysis. Standard request/response logs are insufficient—you need the full reasoning chain, tool invocations, outputs, and any error states captured in a single trace.
Phase 4: Optimize for Performance (The Cost and Latency Problem)
Step 4.1: Implement intelligent model routing.
Map your agent’s task types to a three-tier model hierarchy: complex reasoning to your flagship model, summarization and classification to a mid-tier model, and simple lookup and formatting tasks to your smallest available model.
def route_to_model(task_type: str, complexity_score: float) -> str:
if complexity_score > 0.8 or task_type in ["multi_step_reasoning", "code_generation"]:
return "gpt-4"
elif complexity_score > 0.4 or task_type in ["summarization", "sentiment_analysis"]:
return "gpt-4o-mini"
else:
return "gpt-3.5-turbo" # Or equivalent small model
Step 4.2: Implement multi-tier caching.
Use three cache levels as recommended in the NotebookLM research report: an exact-match cache for identical queries, a semantic cache for related queries using embeddings, and a prompt cache to store repeated system instructions and reduce token costs on every call.
Step 4.3: Move to async-first orchestration.
Any agent making more than one tool call in sequence should be audited for parallelization opportunities. If a task requires fetching from a CRM and querying a database independently, those calls should run in parallel.
Phase 5: Curate Your Data Library (The RAG Problem)
Step 5.1: Treat your data library as a database, not a firehose.
The research warns against the “firehose” approach to Retrieval-Augmented Generation (RAG)—dumping every available document into the index and hoping the retriever finds what it needs. Use “identifying fields” to help the retrieval system and “content fields” for the LLM response generation.
Step 5.2: Validate ingestion before going live.
After loading data into your search index or vector table, confirm the index status is “Ready” before enabling agent access. Agents querying empty or partially-built indices will hallucinate with false confidence.
Step 5.3: Run utterance analysis monthly.
Use your agent’s conversation logs to identify where it is failing to produce accurate answers. The research calls this “utterance analysis”—systematically reviewing real user queries that resulted in low-confidence or incorrect responses to identify knowledge gaps in your data library.
Phase 6: Establish a Governance Layer (The Launch-and-Leave Problem)
Step 6.1: Define Jobs to be Done before configuration.
Run a discovery workshop before touching configuration. Define the agent’s specific jobs to be done (JTBD) and establish baseline success metrics—concrete numbers like “reduce average handle time by 90 seconds” or “resolve 40% of tier-1 tickets without human escalation.”
Step 6.2: Implement a rollback window.
Give users and operators a configurable undo window for agent actions. A well-designed rollback window lets humans catch and reverse overzealous file deletions, incorrect API calls, or erroneous data updates within a defined time window—before they become irreversible incidents.
Step 6.3: Schedule quarterly permission reviews.
Agent permissions accumulate over time. New integrations get added; old constraints get loosened. Build a quarterly review cadence into your operations calendar to audit permissions, skills, and safety core configurations against the current threat landscape.
Expected Outcomes
After completing this six-phase audit, you should have:
– A zero-permission-baseline service account with scoped, documented permissions
– A vetted skills inventory with no unreviewed community packages
– Pinned dependencies with a fault injection test suite in CI
– An intelligent routing layer reducing inference costs
– A structured data library with validated indices
– Documented JTBD metrics and a governance review schedule
Real-World Use Cases
Use Case 1: Enterprise Sales Agent with CRM Access
Scenario: A B2B sales team deploys an agent that reads Salesforce records, drafts follow-up emails, and schedules meetings autonomously.
Implementation: Apply Phase 1 (privilege audit) by creating a scoped service account with read access to opportunity records and write access only to activity logs. Apply Phase 2 (skills audit) by reviewing the email drafting skill for prompt injection patterns. Apply Phase 6 (governance) by defining the JTBD as “draft three follow-up email options per open opportunity, per week” with a baseline metric of reducing SDR email-drafting time by 45 minutes daily.
Expected Outcome: The agent operates reliably within its permission boundary. Sales reps review and approve email drafts before send. No unauthorized data access. Performance measured against the 45-minute baseline.
Use Case 2: Customer Support Agent with Knowledge Base RAG
Scenario: A SaaS company deploys a support agent that answers tier-1 tickets by querying a product knowledge base using RAG.
Implementation: Apply Phase 5 (data management) with identifying fields for product version and topic classification and content fields for actual documentation text. Validate index readiness before launch. Set up monthly utterance analysis to identify knowledge gaps as the product evolves.
Expected Outcome: Accurate tier-1 resolution rate improves over time as knowledge gaps are identified and addressed. Hallucination rates decline because the RAG architecture is curated, not firehosed.
Use Case 3: DevOps Agent with Shell and API Access
Scenario: A platform team deploys an agent that monitors infrastructure, opens tickets for anomalies, and executes remediation scripts.
Implementation: This is the highest-risk deployment type. Apply all six phases, with extra attention to Phase 1 (immutable safety core explicitly blocking production deployments and database schema changes without human approval) and Phase 3 (fault injection testing for infinite reasoning loops). Require session-based re-authentication for any remediation action.
Expected Outcome: The agent operates as a force multiplier for the platform team—identifying and triaging anomalies at machine speed—while human engineers retain control over any action with irreversible consequences.
Use Case 4: Marketing Content Agent with Multi-Model Routing
Scenario: A marketing team deploys an agent that generates ad copy variants, classifies campaign performance, and summarizes analytics reports.
Implementation: Apply Phase 4 (model routing) by routing ad copy generation to the flagship model, campaign classification to a mid-tier model, and report summarization to the smallest capable model. Implement semantic caching for frequently-requested campaign summaries.
Expected Outcome: Inference costs drop significantly as simple classification and summarization tasks stop hitting the flagship model. Content quality remains high for creative generation tasks where the flagship model is used.
Use Case 5: Legal Research Agent with Document Access
Scenario: A law firm deploys an agent to search case law databases, summarize precedents, and draft motion outlines.
Implementation: Apply Phase 1 (least privilege scoped to read-only access to specific case law databases, no write access to client files without explicit approval) and Phase 2 (zero community skills—all capabilities are first-party or vetted internally). Apply Phase 6 governance with documented JTBD metrics and weekly log reviews given the sensitivity of the domain.
Expected Outcome: Attorneys get research acceleration without the liability exposure of an over-privileged agent that could write to client files or exfiltrate sensitive documents.
Common Pitfalls
Pitfall 1: Cloning admin credentials for the agent service account.
Most teams do this because it’s fast and eliminates permission errors during development. The problem is it never gets cleaned up before production. An agent running with admin credentials that receives a successful prompt injection attack now has admin access to everything. Fix: always create agent accounts from zero permissions, following the Principle of Least Privilege (PoLP) as recommended by the NotebookLM research report.
Pitfall 2: Installing community agent skills without review.
The ToxicSkills research shows that 36.82% of agent skills have security flaws. Installing a community skill because it looks useful and has good reviews is the agent ecosystem’s equivalent of curl https://example.com | bash. Always audit skills before installation. If you can’t audit the source code, don’t install it.
Pitfall 3: Using “latest” for all dependencies.
Dependency and integration changes account for nearly 20% of agentic system failures per the research. A utility library update that changes a function signature can silently break an entire reasoning pipeline. Pin versions, use lock files, and test dependency updates in staging before production.
Pitfall 4: Building RAG without curation.
Dumping 50,000 documents into a vector index and calling it a knowledge base produces an agent that confidently hallucinates. The retriever has no signal about which documents are authoritative, current, or relevant to specific query types. Curate the library, use structured fields, and validate index readiness before enabling agent access.
Pitfall 5: Treating agent deployment as a one-time event.
Governance rot is real. Agents deployed six months ago may be running with permission sets that have expanded, skills that haven’t been reviewed against new threat intelligence, and success metrics that were never checked after launch. Build quarterly review cycles into your operations calendar from day one.
Expert Tips
Tip 1: Build your rollback window before you need it.
Implement a configurable undo window for destructive agent actions before launch, not after the first incident. The rollback window is your safety net for overzealous file deletions, incorrect API calls, and erroneous mass communications. It is significantly easier to build this into your architecture from the start than to retrofit it after a production failure.
Tip 2: Run your safety core from a separate, agent-inaccessible config path.
If your agent’s safety constraints live in the same configuration file that the LLM can read and reference, a sophisticated prompt injection attack can instruct the agent to “update” those constraints. Keep your immutable safety core in a file path that is outside the agent’s read scope entirely.
Tip 3: Use utterance analysis as your product roadmap for the knowledge base.
Monthly utterance analysis—reviewing the queries where your agent performed poorly—is the most valuable input you have for improving RAG accuracy. Each cluster of failed queries maps directly to a knowledge gap that can be addressed by adding or restructuring documents. This is a continuous improvement loop, not a one-time setup task.
Tip 4: Profile your actual query distribution before building your routing logic.
Don’t guess which tasks need flagship models. Run a two-week logging period in production with your current model, classify every query by type and complexity, and then build your routing rules from the actual distribution. Most teams discover that 60-80% of their queries are classification or summarization tasks that a smaller model handles equally well.
Tip 5: Treat your agent’s permission audit as a threat model, not a checklist.
Walk through every permission your agent holds and ask: if an attacker could inject a single malicious instruction that made the agent use this permission, what’s the worst-case outcome? If the answer is “exfiltrate the entire customer database” or “send a mass email to all contacts,” the permission scope is too broad. Reduce it until the worst-case outcome of a successful injection is tolerable.
FAQ
Q: How do I know if my agent has been compromised by a ToxicSkill?
A: Look for behavioral anomalies in your enriched event logs: unexpected outbound network calls, base64-encoded strings appearing in tool parameters, unusual file system access patterns, or any tool invocations that weren’t triggered by user intent. The research documents exfiltration attempts using obfuscated commands—standard request/response logging won’t catch these. You need full trace logging of every tool invocation with its parameters, as recommended in the NotebookLM research report.
Q: What’s the minimum viable governance setup for a small team?
A: Three non-negotiables: a scoped service account built from zero permissions, an immutable safety core with documented constraints, and a quarterly review calendar entry. Beyond that, the governance layer scales with risk. A low-risk internal productivity agent needs less governance than a customer-facing agent with write access to a CRM. Start with the minimum and add governance where the risk profile demands it.
Q: How much can intelligent model routing actually reduce costs?
A: The NotebookLM research report documents potential inference cost reductions of up to 90% through intelligent routing—reserving flagship models for complex reasoning and routing classification and summarization tasks to smaller, cost-effective models. The actual reduction depends on your query distribution. A content agent with a high proportion of summarization tasks will see larger savings than a code generation agent where most queries need complex reasoning.
Q: Can I safely use community agent skills at all?
A: Yes, with a rigorous review process. The Snyk ToxicSkills research shows the ecosystem has serious supply chain problems, but not every community skill is malicious. The critical discipline is reviewing every skill’s source code (or a trusted audit of it) before installation, checking permissions against the principle of least privilege, and scanning configuration files for known injection patterns. If a skill is closed-source or unaudited, the risk is too high for production agents with sensitive permissions.
Q: How do I set meaningful baseline metrics before deploying an agent?
A: The research recommends conducting discovery workshops that define the agent’s Jobs to be Done (JTBD) and tie them to measurable outcomes. Concrete examples: “reduce average handle time by 90 seconds,” “resolve 40% of tier-1 support tickets without escalation,” “generate three qualified outreach email variants per open opportunity per week.” Abstract goals like “improve productivity” cannot be measured and cannot tell you whether the agent is working. Specific, baseline-anchored metrics can.
Bottom Line
The six failure modes documented here—over-privileged service accounts, ToxicSkills supply chain contamination, architectural fragility from tight coupling, performance collapse at scale, firehose RAG data mismanagement, and governance neglect—are not edge cases. They are the current, documented failure patterns of production agentic deployments, backed by empirical research across security, reliability, and performance dimensions. Salesforce’s implementation analysis puts it plainly: an agentic enterprise requires governance as a foundational layer, not an afterthought. The agents that survive in production are the ones built with scoped permissions, audited skills, pinned dependencies, structured data, and an operational review cadence—not the ones that were deployed fastest. Run the six-phase audit in this tutorial against every agent you have in production today. The findings will not be comfortable, but they will be actionable.
0 Comments