When AI Starts Finding Zero-Days,Our Security Model Breaks
We're Modeling AI the Wrong Way
I don't trust systems I can't explain. Not because they're broken — but because they might be, and I wouldn't know when.
Most of the systems I work with are deterministic. If something fails, I can trace it. If something is vulnerable, I can reproduce it. Even when detection gets messy — false positives, false negatives, edge cases — the system still behaves in ways I can reason about.
AI doesn't.
That's why Anthropic's Mythos Preview didn't feel like another milestone to me. It felt like a shift in the threat model — one that doesn't quite fit the way we've traditionally thought about systems, vulnerabilities, and trust.
From Capability to Consequence
What stood out to me wasn't just that Mythos can find vulnerabilities. It was the implication behind it.
Once a model can identify subtle flaws, reason about exploit paths, and sometimes chain them together, it's no longer just assisting security teams. It's participating in the vulnerability lifecycle itself.
For years, vulnerability discovery was shaped by constraints — time, expertise, intuition. Those constraints slowed everything down, including attackers. There was always a gap between discovery and exploitation, and that gap gave defenders room to react.
That gap is starting to shrink.
Mythos Preview found a 16-year-old vulnerability in FFmpeg — software used by virtually every application that handles video — hidden in a single line of code that automated testing tools had hit five million times without catching it. Discovery, it turns out, was never the bottleneck. We were.
Discovery becomes cheaper. Exploitation becomes easier. The question is no longer just whether we'll miss vulnerabilities. It's what happens when machines start finding the ones we never saw.
Why This Feels Familiar
A lot of my thinking comes from detection engineering.
The hardest problems aren't writing checks — they're dealing with how those checks behave in the real world. You build logic that works across most environments, validate it, test it, and then something breaks. Not because the logic was wrong, but because reality had a branch you didn't model.
Detection has always been an approximation.
AI systems feel like that same problem, except now the system itself is the approximation. You're no longer only asking whether your detection is correct. You're asking whether the system's reasoning is correct — and more importantly, whether you would even know if it wasn't.
That's a very different kind of uncertainty. And it's one our field hasn't built a discipline for yet.
AI as an Attack Surface
One thing that started bothering me while thinking about Mythos is how quickly the idea of a trust boundary starts to fall apart.
In most systems, I know where input ends and internal logic begins. With AI, that line is harder to draw. A prompt isn't just input — it shapes reasoning. Context isn't just data — it influences behavior. And output isn't always just output — it often becomes action.
From a security perspective, interacting with an AI system feels less like calling a service and more like running partially trusted logic with unknown execution paths. The uncomfortable part is that the most critical layer — the reasoning itself — is the one we can't fully see.
Here's how I've started modeling the risks across these fuzzy boundaries.
In traditional systems, we spend most of our time securing inputs and validating outputs. With AI, the layer making decisions is the least observable. That alone changes how I think about risk.
What This Means for Signature Engineering
This is where it becomes very real for me.
A lot of my work revolves around identifying whether something is vulnerable based on patterns — versions, files, configurations, behaviors. That approach assumes the system behaves consistently enough to model.
AI challenges that assumption.
Imagine a model suggesting a working exploit path for a service. The chain looks correct, the steps make sense, and the reasoning is clean — but it quietly misses one constraint, like a privilege boundary or a runtime condition that invalidates the entire path. On the surface, everything looks right. But operationally, it isn't.
That's the kind of failure that's hard to detect — because it doesn't look like a failure. And that's where the shift happens. You're not just detecting vulnerabilities anymore. You're validating reasoning. Correctness stops being binary. It becomes contextual. And trust stops being assumed. It becomes conditional.
The Failures That Worry Me Most
Every security engineer has dealt with failures that weren't obvious. Not because the logic was bad, but because the system behaved differently than expected. Those are the issues that survive testing, slip through validation, and show up later when it matters most.
AI expands that category.
Sometimes the output is clearly wrong — that's easy to reject. But the harder case is when the system gives you something that looks completely valid. It follows the right structure. It sounds correct. It passes basic checks. It's just not right.
Mythos found a 27-year-old vulnerability in OpenBSD — one of the most security-hardened operating systems in the world, specifically built and audited for this kind of hardening. That flaw survived decades of expert review. It didn't announce itself. Everything indicated the system was safe. It wasn't. That's the category I'm worried about.
That reminds me of patch gaps — where everything looks fixed, detection passes, and yet the vulnerability still exists under slightly different conditions. Those are the failures that matter. Because they don't announce themselves.
The Mindset Shift
Thinking about Mythos forced me to change the way I approach AI systems. I've stopped asking whether the system is correct. Instead, I find myself asking:
Where can this system be influenced? Not just at the input layer — through context, chaining, and assumptions baked into training.
What assumptions am I making about its behavior? Every signature I write encodes an assumption. AI forces me to interrogate those explicitly.
What does failure look like — and would I notice? The hardest failures are the constraints that don't trigger an alert. That's as true for AI as for detection logic.
It's the same instinct I use when dealing with untrusted inputs, complex parsers, or systems with undefined behavior. Except now the system itself is probabilistic — and evolving.
Final Takeaway
What Anthropic Mythos changed for me isn't how useful I think AI is. It changed how I think about its role in security.
I no longer see it as something that sits outside the system and helps us. I see it as something that can influence, accelerate, and reshape the system itself. And that means our assumptions need to change with it.
What's interesting is that the public footprint doesn't fully reflect the scale being implied yet. A lot of what's being discovered is still not visible — either because it's under embargo or not fully attributed. Which means the picture we're reasoning from is already incomplete. And that discomfort is part of what I keep coming back to.
Security has always been about reducing uncertainty. AI increases it — not because it's broken, but because it's powerful in ways we don't fully understand yet.
We built systems we could debug. Now we're building systems we can only question.
We need to stop thinking of AI as something we validate and start treating it as something we continuously challenge. Because the biggest risk isn't when AI fails. It's when it's convincingly wrong — and we trust it anyway.
Disclosure: I work as a Security Engineer. The views here are my own, based on publicly available information. I aim to keep this writing honest and grounded in real-world security practice.