Built Secure, Deployed Dangerous

Built Secure, Deployed Dangerous

An agent that was built correctly can still drift into dangerous territory. Most organizations have no way to detect it until something breaks.

The Scariest Agent Failure Mode

In March 2026, a security research firm called CodeWall pointed an autonomous AI agent at McKinsey’s internal AI platform. The agent identified and exploited a vulnerability in Lilli, McKinsey’s internal AI platform used by tens of thousands of employees. McKinsey confirmed the issue, fixed it within hours of disclosure, and stated its investigation found no evidence that client data was accessed.

The details of what exactly was exposed are disputed. What isn’t disputed is the core finding: a vulnerability sat in a production AI system long enough that an outside agent found it before any internal control did.

The platform hadn’t been built badly. The underlying model was fine. The application logic was sound. One configuration gap, invisible to standard tooling, sat quietly in production until something specifically designed to find it came looking.

We’ve Seen This Before

Cloud security went through exactly this reckoning.

Organizations spent years building secure cloud infrastructure… and then discovered that secure at deployment didn’t mean secure six months later. Permissions accumulated beyond their original scope. Configurations drifted as engineers made quick fixes under pressure. Resources stood up for a project and never stood down. The environment changed and the security posture didn’t.

Cloud Security Posture Management emerged specifically to solve this. Not to replace build-time security controls, but to continuously verify that deployed infrastructure still matched its intended configuration. CSPM became a recognized discipline because organizations learned, expensively, that a point-in-time assessment of a continuously changing environment tells you very little about your actual exposure right now.

The same reckoning is playing out with AI agents. And most organizations are in the “before” part of the story.

Part 4 of this series established that the autonomy level for most agent actions was set by default, not by deliberate decision. The posture management problem runs parallel: the configuration of most deployed agents was set at launch, not monitored since.

Three Things That Change After Deployment

Agents don’t operate in a stable environment. Three things change after deployment, any one of which can shift an agent from safely configured to dangerously misconfigured without anyone making a deliberate decision.

The agent itself gets updated. New model version. Revised instructions. Additional tool added to support an expanded use case. Each of these changes the agent’s behavior surface. The risk tier decision from Article 4 was made about a specific version of this agent. Is it still valid for the version running today?

The systems it connects to change. APIs get updated. Permissions get modified. New data flows through systems the agent has access to. A connection that was safe when the agent was configured may expose something it was never intended to touch.

The volume and variety of inputs changes. Edge cases that never appeared in testing start appearing in production. An agent handling ten transactions a day behaves differently from the same agent handling ten thousand. The configuration that was appropriate at pilot scale may not be appropriate at production scale.

None of these require malicious intent. None of them require anyone to make a bad decision. They just require time and the normal operation of a complex environment. That’s what makes drift different from a misconfiguration at deployment. Deployment-time errors are findable with the right pre-launch review. Drift is what happens after launch, quietly, while the monitoring dashboard shows green.

What Drift Actually Looks Like

The misconfiguration patterns that show up in practice aren’t exotic. Excessive privilege. Weak or shared credentials. Policy violations nobody caught because no tool was watching. Access patterns that don’t trigger alerts because they’re technically within policy. The same shapes, across different organizations, at different scales.

Privilege accumulation. An agent is provisioned with access to one system for a specific task. Over time, through workflow expansions, through “just temporarily” access grants that never got reversed, through integrations that pulled in additional permissions, it’s now touching five systems. Nobody revoked the original scope because nobody was tracking it against the original intent. The agent is operating with far more access than its current task requires. This is the digital equivalent of an employee who keeps their badge from every role they’ve ever had.

Instruction drift. Agent instructions, the prompts and rules that define what the agent is supposed to do, are updated by developers iterating quickly. What started as a carefully scoped workflow agent now has instructions added piecemeal across a dozen updates, by different people, with no one reviewing the cumulative effect. The agent is doing things the original design never intended. Not because it’s broken. Because its instructions changed in ways nobody aggregated and reviewed.

Connection creep. The agent was connected to two MCP servers at deployment. Three more were added over time to support expanded use cases. Nobody mapped how those new connections changed the agent’s access surface. The agent registry says one thing. The agent is actually connected to something different.

Behavioral drift. The underlying model gets updated. Same prompt, same instructions, same configuration on paper… different behavior in practice. The agent that was correctly classified as safe to automate under one model version behaves differently under the next. The policy still says auto-approve. The agent is no longer the agent the policy was written for.

The McKinsey case illustrates what this looks like in practice. A vulnerability in a production AI system, sitting undetected long enough that an outside agent found it before any internal control did. Not a novel attack. A configuration gap that conventional tooling wasn’t positioned to catch.

Why Traditional Monitoring Misses This

Here’s the problem. Existing observability tools were built for deterministic systems. They catch crashes, timeouts, error rates, and resource exhaustion. They don’t catch an agent that is running correctly by every technical metric while doing the wrong thing operationally.

Three specific gaps make this hard.

Agents don’t fail loudly. A traditional system throws a 500 error when something goes wrong. An agent confidently executes the wrong action and logs a success. The monitoring system sees a completed task. The human reviewing the output, if anyone is reviewing it, sees a result that looks plausible. The damage accumulates before anyone knows to look.

The baseline problem. Drift detection requires knowing what correct looks like. For deterministic systems, that’s knowable: the expected output for a given input is defined. For agents it isn’t. An agent given the same input on two different days may produce different outputs, and both may be valid. Defining the behavioral baseline is a new discipline that most organizations haven’t started. Without it, you can detect that an agent is doing something different. You can’t detect whether what it’s doing is wrong.

Volume. An agent running at production scale takes thousands of actions per day. No human team can review them all. The actions most likely to represent drift, the ones that look almost right but aren’t quite, are the hardest to spot in a queue. This is why posture management can’t be a manual exercise at scale. It has to be systematic.

The Right Order

Most enterprises are adding monitoring on top of poorly constrained agents. That’s the wrong order.

The right order:

First, know what you have. The agent registry from Part 2 is the prerequisite. You can’t establish a configuration baseline for an agent you haven’t inventoried. You can’t detect drift if you don’t know what the agent was supposed to be doing.

Second, establish the baseline at deployment. Before an agent goes into production, document its intended state. What systems does it connect to. What permissions does it hold. What model version is it running. What are its instructions. What is its behavioral scope. That documented baseline is what everything else measures against. Most organizations don’t have this. Building it is the starting point.

Third, encode the constraints. The risk tiering work from Part 4, which actions are auto-approved and which require human review, only holds if the agent is still the agent that policy was written for. Posture management is how you verify that assumption continuously rather than hoping it remains true.

Fourth, monitor against the baseline continuously. Configuration is a continuous posture, not a quarterly review. An agent’s attack surface shifts every time it’s updated, given a new tool, or connected to a new service. The question isn’t “was it configured correctly at launch?” It’s “is it still configured correctly right now?”

What Continuous Posture Management Actually Looks Like

This doesn’t require a platform on day one. It requires a discipline.

Start with a configuration record for every agent in the registry. A documented snapshot of intended state at deployment: what systems it connects to, what permissions it holds, what model version it’s running, what its instructions say, what its behavioral scope is supposed to be. That record is the source of truth everything else measures against. Most organizations don’t have it. Building it is the starting point.

From there, any change to the agent needs to be reviewed against that record. Updated instructions. New tool connection. Model version change. Permission modification. The question isn’t whether the change is bad. It’s whether someone deliberately approved it and whether the configuration still reflects what the risk tier decision assumed. Right now, most of those changes happen with no formal review at all.

The harder layer is behavioral. Configuration records tell you what an agent is supposed to be doing. Behavioral baselines tell you whether it’s actually doing that. What does normal operation look like for this agent, at this task, in this environment? Significant deviation is a signal worth investigating even when nothing in the configuration formally changed. Lasso Security calls this behavioral fingerprinting, the same approach already used for human identity security. An agent that starts operating outside its established pattern is worth examining, whether or not the configuration record shows anything wrong.

The Governance Argument

Here’s what makes this a leadership problem rather than a security team problem.

Posture management requires a named owner for every agent in production. That owner is responsible for knowing what the agent’s intended configuration is, reviewing changes, and investigating anomalies. Without a named owner, nobody is watching. Without systematic monitoring, even a named owner can’t watch at scale.

The accountability question: when an agent drifts into dangerous territory and causes harm: data accessed it shouldn’t have touched, decision made it wasn’t authorized to make, cost incurred nobody approved… who answers for it? If the answer is “nobody knew,” that’s not a technology failure. That’s a governance failure.

The McKinsey case is instructive precisely because it wasn’t the result of a sophisticated attack. It was a configuration gap in a production system that the build-time security review didn’t catch and the ongoing monitoring didn’t surface. An outside agent specifically designed to probe for misconfiguration found it. That’s the pattern posture management is designed to close.

In a Dark Reading poll, 48% of cybersecurity professionals identified agentic AI as the single most dangerous attack vector heading into 2026. According to IBM’s 2025 Cost of a Data Breach Report, organizations with high levels of shadow AI saw breach costs averaging $4.63 million, roughly $670,000 more than a standard breach.

Those numbers aren’t abstractions. They’re what happens when agents drift and nobody has the infrastructure to detect it.

The technology to build that infrastructure exists. The discipline to apply it is what most organizations are missing.

The Question Worth Asking Now

Do you have a way to detect when an agent’s configuration has drifted from what was originally approved… without waiting for an incident to surface it?

What’s Next

Posture management keeps individual agents monitored against their baseline. But as the agent population scales to dozens or hundreds running in production, continuous human review of every agent’s posture becomes impossible. The volume problem makes systematic human oversight unsustainable.

At that scale, who’s watching the agents between reviews?

That’s the most provocative question in this series. And the answer is going to make some readers uncomfortable.

That’s Part 6.


Managing the Digital Workforce | Part 5


This article is part of the Managing the Digital Workforce series — a nine-part framework for governing enterprise AI at scale. View the full series →