The Consequences of Agentic AI
COMMENT

The Consequences of Agentic AI

Morgan Willis15 min read
agentssecurityawsagentcorestrands

Customer support agents are hallucinating policies. Coding agents are deleting production resources. Dependency supply chains are getting poisoned. Agent prototypes are failing in QA due to reliability issues. We're experiencing the consequences of agentic AI in real time.

The industry is moving so fast to ship AI products (and build with AI products) that it feels like we're moving faster than the foundational work required to make agents reliable and safe at scale. “Move fast and break things” used to define Silicon Valley, but the industry has mostly moved on from that mindset and we shouldn't be bringing it back in the AI era. Moving fast is fine. Failing fast and iterating is how good software gets built. But when your agent can autonomously access and modify customer records, execute code, and make API calls across backend systems, you need to be deliberate about what you're willing to break and what you aren't. Some failures are cheap to learn from. Others are a liability with real consequences.

We are seeing that the capability of AI models is clearly there. We don't need to convince each other about that anymore. But capability without the engineering discipline to run it safely is what's blocking us from moving out of the demo phase with AI agents, all while we are feeling the sharp edges of what happens when agentic AI goes wrong. The conversation needs to move from what AI can do to what it takes to run it responsibly.

On top of that, there's a wide spectrum of what an agent actually is. On one end, you have a coding assistant on your laptop where you're clearly in the loop, hopefully reviewing suggestions before they touch anything.

On the other end, you have an autonomous agent deployed to the cloud, handling thousands of interactions a day with no human reviewing each response. The failure modes, the risk profiles, and the consequences are very different at each point on that spectrum, and thinking clearly about where your agent sits on it is the first step to running it responsibly.

The thing is, there are great engineers and teams building tools, services, and creating patterns to move us all towards a safer agentic AI reality. It's just that there is no one size fits all answer, and just like cybersecurity in general, these are not easy problems to solve.

This post is about helping you build the mental model you need so you can think around corners about what can go wrong with agentic AI, focusing mostly on deployed agents.

The failure modes of agentic AI break down into three categories that overlap and compound:

  • agents say the wrong thing
  • agents do the wrong thing
  • someone tries to break your agent

Each one of these categories encompasses multiple things and deserves a deeper dive. We're going to do a high level overview in this post and I'll be covering specific patterns and solutions with code examples in follow-up posts over the coming weeks.

But first, let's get a shared understanding of the obstacles and risks of deploying agents, and I'll make some recommendations on things you can look into to help. I'll be using AWS as my toolkit throughout the post, since I work for AWS, but the patterns apply to any agent you are building anywhere.

Failure Mode #1: Agents Say the Wrong Thing

The most baseline failure mode is output quality. An agent gives an end user bad information, hallucinates a policy that doesn't exist, or confidently recommends the wrong course of action. Or maybe the agent responds to a request it shouldn't have.

This is an area where the industry has made decent progress. As models improve they're getting better at following directions, and there's strong work being done on guardrails, grounding, retrieval-augmented generation, and evaluation frameworks. But there is still work to be done.

Hallucination is a real and well understood issue, but it's actually several problems sharing the same symptom. There's the model making things up from scratch, there's the model misinterpreting source material and synthesizing something that sounds right but isn't, and there's the model confidently filling in gaps when it doesn't have enough context to answer correctly. Each one has different mitigations, and the tolerance for each depends entirely on what you're building.

That tolerance matters a lot more when agents sound confident whether they're right or wrong and there's no uncertainty signal built into the output. A human reading a wrong answer from a chatbot might catch it, but a downstream system consuming an agent's output programmatically won't. A coding agent hallucinating a function signature is annoying, but a customer service agent hallucinating a refund policy is a liability. Not every use case can afford to be wrong.

What You Can Do About It

Deploy guardrails as your first layer. Guardrails sit between your users and your model, filtering both the input and the output. They can block harmful content, detect prompt injection attempts, enforce topic boundaries so your agent stays in its lane, and check whether responses are grounded in your source material. Think of them as the baseline safety net. They won't catch everything, but they catch a lot, and they're the easiest thing to deploy before you start tuning. Guardrails for Amazon Bedrock gives you content filters, denied topic policies, PII redaction, contextual grounding checks, and prompt attack detection out of the box.

Tune guardrails after deployment. Getting guardrails to work well at scale takes intentional effort. It's frustrating when a guardrail denies a perfectly legitimate request, but it's worse when it lets something malicious through. Finding that balance requires continuous testing and tweaking. Build out evaluations, run different versions of your guardrails against them, and measure the performance. Then monitor guardrails in production, dive into the edge cases, and feed what you learn back into both the guardrails and the evaluations. It's a continuous thing, not a one-time setup.

Build evaluation frameworks and actually use them. You need a way to measure whether your agent is getting better or worse over time. Evaluation frameworks let you build test suites that cover your critical paths and run them against new versions of your agent, your guardrails, or your prompts before they hit production. Strands Agents SDK has a built-in evaluation framework for testing agent behavior, and AgentCore evaluations give you a managed way to run evals at scale.

Give your agent the right context. A lot of hallucination comes from the agent not having what it needs to answer correctly. Context engineering, making sure your agent has the right information at the right time in the right format, goes a long way. Your agent benefits from having access to conversation history, semantic memory retrieval, user state, tools, and domain context that gets loaded at the right time in the workflow. The more relevant context an agent has when it generates a response, the less it has to guess. Think about what information your agent needs at each step and make sure it's available, whether that comes from a knowledge base, a tool, a previous agent's output, or a system prompt that's been tuned for the task.

One of the most common context engineering patterns is retrieval-augmented generation, and how you chunk, index, and retrieve your source material matters a lot. Poor chunking leads to poor retrieval, which leads to hallucinated answers. Knowledge Bases for Amazon Bedrock handles the retrieval pipeline, but you still need to think carefully about your source data and chunking strategy.

Watch for cascading errors in multi-agent workflows. When one agent's output becomes another agent's input, a hallucination in step one becomes a confident wrong assumption in step two. Structured output helps constrain the format, but it doesn't guarantee the content is correct. Consider using an LLM-as-judge pattern where a secondary agent reviews outputs before they get passed along, checking whether the data is grounded and consistent. Strands Agents SDK has a steering mechanism that lets you intercept and validate agent outputs mid-workflow. I wrote about this pattern in The Agent Buddy System.

Ground your outputs. If your agent is making claims about facts, policies, or data, verify those claims against your source material before delivering them. Contextual grounding checks in Amazon Bedrock can score a response against a provided source for both factual grounding and relevance to the user's query, and you set confidence thresholds to determine what gets blocked. For workflows where your agent is answering questions from a knowledge base or summarizing source material, it gives you a programmatic way to catch hallucinations before they reach the user.

But even with all of this in place, wrong answers are only part of the story because agentic AI also takes actions, which has a different set of consequences.

Failure Mode #2: Agents Do the Wrong Thing

Depending on what your agent was built for, it could delete resources, modify configurations, send emails, make API calls, write and execute code, or interact with your internal systems. This ability to interact with real world resources and systems through tools and MCP servers is what makes agents useful, and it's also what makes them dangerous.

An agent running in an autonomous loop can compound the same wrong action a hundred times, at speed and at scale, before a human gets a chance to intervene. This can create a mess for you to clean up and can mean runaway costs and token consumption. On top of that, depending on what tools your agent has available to it, it can also take actions that are harmful, like deleting data or misconfiguring systems. The answer most people are reaching for is human-in-the-loop.

But human-in-the-loop isn't one size fits all. Where you put the human, and how much authority you give the agent before it has to check in, depends entirely on what's at stake.

Then there's also the problem of hallucinated tool inputs. An agent might call the right tool but with wrong parameters, like passing a production database ID instead of a staging one, or using the wrong account number that matches a different user's account. The tool executes successfully, returns a valid response, and the agent keeps going. Nothing in that chain flagged an error, but the action was wrong.

Agents can also call the wrong tool entirely. If your agent has access to hundreds of tools and the model picks the wrong one based on a misunderstood request, you might end up with a customer record being modified when the intent was to look it up, or a message being sent when the intent was to draft one. The more tools you expose to an agent, the more opportunities there are for the model to choose incorrectly, and the consequences depend on what those tools can do.

What You Can Do About It

Intercept tool calls before they execute. MCP tool interceptors let you add a layer between the agent and the tool that checks authentication, validates business logic, and overrides or rejects parameters before the call goes through. If your agent is about to pass a production resource ID to a delete operation, an interceptor can check the context of the environment against the values being passed in and catch that. Similarly, if an agent passed in the wrong account number it can check against the authorized user and ensure it matches before it hits the tool. This is not something that is built into MCP, it's something you have to design into your architecture. AgentCore Gateway supports MCP interceptors out of the box that give you this kind of pre-execution control.

Control which tools your agent can access, and under what conditions. Policy engines let you define granular access rules: this agent can call this tool under these conditions. Additionally, if the agent doesn't need a tool, don't expose it. The fewer tools available, the fewer opportunities for the model to choose the wrong one. Cedar policies for AgentCore Gateway give you a declarative way to express these constraints.

Consider smaller, more focused agents. Don't make one agent do too much. A narrower scope is easier to control, easier to evaluate, and easier to secure. The tradeoff is similar to microservices versus a monolith: you add operational complexity and latency by decomposing your agents into smaller units, but you gain more granular control over what each one can do and a clearer picture of where things go wrong when they do.

Use steering for tool calls too. The agent steering pattern from the hallucination section applies here as well. Intercept tool calls before they execute and have a secondary agent review the tool choice and parameters before allowing the call. Strands Steering has built-in support for both steer_before_tool and steer_after_model.

Design robust tools. Tools should validate their own inputs, check for obvious errors, and refuse to execute when something looks wrong. For deployed agents, avoid exposing shell access or ad-hoc SQL queries. This might make sense for a local coding agent where a developer can approve or deny each action, but for autonomous deployed agents you need to narrowly define what the tool can do. That takes more development time up front, but it buys you more safety. Letting an agent have wide-open SQL access to a customer database is asking for trouble.

Invest in observability. When an agent does the wrong thing at scale, you need to know about it fast. Trace tool calls, log decision chains, and track token usage per session. Agent observability tells you whether the agent is doing what it's supposed to be doing. Build dashboards that surface anomalies in tool call patterns, error rates, and output quality so you can catch compounding failures before they get out of hand. Services like AgentCore have observability built in.

Plan for recovery. Prevention is important, but so is blast radius containment. If an agent makes a hundred wrong API calls before you catch it, what's your rollback plan? Prefer soft deletes over hard deletes and keep audit trails that let you undo what went wrong. The agents that are safest to run in production are the ones where the worst case is recoverable.

So far we've been talking about agents failing on their own, either by saying or doing the wrong thing. But there's a third category that adds an adversary to the equation.

Failure Mode #3: Someone Tries to Break Your Agent

Every tool your agent can call is an attack surface, every MCP server you connect is a trust boundary you're extending, and every document your agent pulls in from the internet is a potential injection vector. The security surface of agentic AI is wider than what we've dealt with in traditional software, and it's growing as agents get more capable and more connected.

Supply chain risk takes on a new dimension with agents. When a coding agent dynamically pulls in dependencies, generates code that imports packages, or fetches content from external sources, all of that is happening with less human oversight than a developer manually choosing their dependencies. Agents don't have judgment, they have pattern matching. The same discipline we apply to dependency scanning and code review needs to extend to agent-generated code and agent-selected dependencies. That means automated security scanning and review steps between the agent and the commit, regardless of how good the agent's output looks.

MCP servers have a trust problem. When you connect an external MCP server, your agent gets access to whatever tools that server exposes. Tool descriptions can be mutated after installation, and most implementations don't detect when this happens. The server you connected yesterday can change what its tools do today, and your agent will follow the new descriptions without knowing anything changed.

Prompt injection is getting more advanced. Agents dynamically pull in documents, read websites, consume API responses, and fetch packages. All of that ingested data is a potential injection vector. Data is an attack surface in a way it never was before.

Directed abuse connects back to the first two categories. If you have a deployed agent, someone will try to use it for things it wasn't built for. The classic example is using a customer service bot as a free coding assistant to get tokens they'd otherwise have to pay for. That's your agent saying and doing the wrong thing, except now it's intentional exploitation rather than accidental failure.

At scale, this also opens you up to denial-of-wallet attacks, a different flavor of risk than traditional DDoS. Cloud infrastructure can absorb compute spikes because you've been designing for elastic scale for years. But if someone is extracting large volumes of tokens from your agent, you're paying for every one of those tokens whether the interaction was legitimate or not. The traditional infrastructure patterns still apply (rate limiting, bot firewalls, anomaly detection) but you also need agent-specific monitoring that tracks token consumption per session and flags abnormal patterns before the bill gets out of hand.

What You Can Do About It

Start with what you already know. WAFs, API gateways, rate limiting, DDoS protection. These are the same production infrastructure patterns you'd put in front of any internet-facing application, and they apply to agents too. If your agent is handling external traffic, treat it like any other production endpoint.

Use guardrails to block prompt injection and off-topic abuse. The same guardrails from Failure Mode #1 apply here. Configure them to detect injection attempts in user input and to reject requests that fall outside your agent's intended scope. If your agent is a customer service bot, it should refuse coding questions.

Treat external data as untrusted input. If your agent reads documents, scrapes websites, or pulls API responses, all of that content is potentially adversarial. Design your tools and your prompts the same way you'd sanitize user input in a web application. Structure your prompts so that content pulled from external sources is clearly delineated from instructions. The goal is to prevent injected text from being interpreted as directives. Guardrails help here too by scanning the content for known prompt injection techniques before it reaches the model.

For MCP, use gateways and interceptors as a trust layer. Since the protocol itself doesn't enforce authentication or provide fine grained access control, use infrastructure that does. The same AgentCore Gateway interceptors from Failure Mode #2 apply here, sitting between your agent and MCP servers, enforcing policies and validating tool behavior before calls go through. On the identity side, AgentCore runtime and Gateway support both IAM-based and OAuth-based inbound identity, so you can authenticate callers and enforce access control.

Don't assume MCP tool definitions are stable. If you approved a tool's description at install time, that doesn't mean it's the same description your agent is seeing today. The MCP spec has a tools/list_changed notification, but that only helps if your client handles it and you have a process for re-evaluating what changed. The broader point is that you need some way to detect when a tool's behavior drifts from what you originally approved.

Where We're Going

When we moved from on-premises to the cloud, cloud security became a new core technical discipline. Something similar is going to happen with agentic AI, but this is all happening in real time.

Running agents in production requires a security mental model that depends on what the agent can do, where it runs, how much autonomy it has, what data it touches, and how reversible its actions are. Every combination is a different risk profile, and you can't solve it with a single checklist. The approach that makes sense is defense in depth: guardrails, tool design, permission control, protocol-level enforcement, runtime access control, observability, and traditional infrastructure, all layered together the same way we think about security for any production system.

None of these layers are sufficient on their own. Guardrails and steering agents are nondeterministic. Tools can be misconfigured. Policies can have gaps. The point is that each layer catches a different type of problem. Desktop agents, where a coding agent inherits a developer's full permissions with no clean boundary, are a different problem entirely.

The pressure to ship is real, adding in all these defensive layers slows down development. But the companies that treat agentic security as a core discipline, not something they'll get to long after launch, are the ones that will actually run agents in production safely. The ones that skip the foundations will learn the same old lesson the hard way.

In the coming weeks, I'll be diving into specific architectural patterns for some of the areas we've covered here: how to design tools and permissions for least-privilege agents, how to build layered guardrails that match your risk profile, and how to implement defense in depth for deployed agents. If any of these categories sound interesting to you, stay tuned.

Comments

Loading...

Loading comments...

More Content