Microsoft Ran the Best Quality Gate: It Found the Problems Before They Shipped

Microsoft Just Found 16 Security Holes in Windows Using AI — Before Anyone Else Could

And it tells us something important about where every large, complex project is heading

A month ago I wrote about Anthropic’s Claude Mythos — the AI system that scanned millions of lines of open-source code and found critical security vulnerabilities that human researchers had missed for years. I said it felt like a turning point.

I didn’t expect to be back here so soon saying: it’s already accelerating.

What Just Happened

Microsoft has quietly been running an AI system called MDASH (Multi-model Agentic Scanning Harness — yes, really) across some of the most sensitive code in their entire operation: the Windows networking and authentication stack. The stuff your laptop uses every time it connects to the internet.

The result? 16 previously unknown security vulnerabilities. Four of them critical. Found, validated, and patched — all shipped in this month’s routine Windows update.

Most people installed that update without a second thought. What they didn’t know is that it contained fixes for flaws that could have let attackers execute malicious code remotely, through components buried deep in the Windows kernel.

Those fixes existed because an AI found them first.

This Isn’t a Chatbot With a Security Hat On

Here’s where it gets genuinely interesting — and where I think the lessons extend well beyond cybersecurity.

MDASH isn’t a single AI model asked to “look for bugs.” It’s an orchestrated system of over 100 specialised AI agents, each with a distinct role, working in sequence:

  • Preparation agents map the codebase, build context, and identify where attacks are most likely to come from
  • Auditor agents scan the code looking for suspicious patterns and raise candidate findings
  • Debater agents argue for and against each finding — essentially stress-testing whether it’s a real problem or a false alarm
  • Prover agents then attempt to actually trigger the vulnerability — constructing the input that would cause it to fail

It’s the equivalent of hiring a research team, a red team, a devil’s advocate panel, and a QA function — and running them all simultaneously, around the clock, without fatigue or politics.

Microsoft tested it against a private piece of code containing 21 deliberately planted vulnerabilities — code that had never been published online, so the AI couldn’t have “seen the answers.” It found all 21. With zero false positives.

The PM Lens: Why This Matters Beyond Security

If you lead projects or programmes, I’d encourage you to look past the cybersecurity framing for a moment and consider what this architecture actually represents.

It’s a structured quality assurance pipeline with built-in adversarial review.

In most project environments, QA is under-resourced, squeezed at the end of delivery, and dependent on the same people who built the thing being asked to find what’s wrong with it. We know the cognitive limitations of that model. We live with the consequences — the defects that escape to production, the risks that weren’t escalated, the assumptions that were never challenged.

MDASH is interesting not because it uses AI, but because it separates the roles of finder, challenger, and prover into distinct agents with different prompts, different models, and different success criteria. The system is designed to disagree with itself productively.

That’s not a new idea in project governance — it’s roughly what a good stage gate process, an independent assurance review, or a red team exercise is supposed to do. What’s new is the speed and scale at which it can now be applied.

The Other Players: A Quick Scorecard

For context, Microsoft isn’t alone in this space:

Anthropic (Claude Mythos) — still the most talked-about in security research circles. Deep reasoning capability on complex codebases. Limited access preview.

OpenAI (Daybreak) — announced this week with some polished marketing copy and a “contact sales” button. By their own admission, still weeks away from being ready. More of a roadmap announcement than a product launch.

Microsoft (MDASH) — already in production. Already finding and patching critical vulnerabilities in live systems. Already top of the public CyberGym benchmark for real-world vulnerability detection.

In project terms: one team is in production, one is in controlled pilot, and one has issued a press release about a project they haven’t started yet. We’ve all been in that programme board meeting.

What I Keep Coming Back To

The head of the team that built MDASH, Dr. Taesoo Kim, wrote this in his announcement:

“AI vulnerability discovery has crossed from research curiosity into production-grade defence at engineering scale.”

He’s talking about security. But the same sentence could apply to testing, to compliance review, to risk identification, to requirements analysis.

The pattern is the same in every case: tasks that previously required scarce, expensive, experienced human attention — applied sequentially, inconsistently, under time pressure — are becoming parallelisable, auditable, and continuous.

That doesn’t mean human judgement becomes irrelevant. It means the nature of human judgement required shifts. Less finding the problem. More framing what to look for, interpreting what was found, and deciding what to do about it.

In other words: the work that project managers and delivery leaders do becomes more important, not less — but only if we’re willing to move upstream of the AI, rather than downstream of it.

A Closing Thought

When I wrote about Mythos last month, a few people told me it felt abstract — interesting in theory, but distant from day-to-day delivery work.

This week, millions of Windows users received a security patch for vulnerabilities that human researchers hadn’t found in years of looking. The AI found them in days.

That’s not abstract anymore.

The question for all of us isn’t whether AI will change how complex work gets done. It’s whether we’re building the habits, the processes, and the thinking now that will let us lead effectively when it does.

Michael Kennedy writes about project delivery, leadership, and technology at ProjectMetrics.co.uk. If you found this useful, share it with someone managing a complex programme — they’ll thank you for it.

Source material: Security Now podcast, Steve Gibson analysis of Microsoft’s MDASH announcement (May 2026)