Microsoft AI System Beats Anthropic Mythos in Cybersecurity Test

Microsoft’s new AI-powered cybersecurity system, codenamed MDASH, has surpassed Anthropic’s much-discussed Mythos on a major cybersecurity benchmark, signaling a new phase in AI-driven vulnerability research. Unlike traditional single-model systems, MDASH uses more than 100 specialized AI agents working together across multiple AI models to identify real-world software vulnerabilities faster and more accurately.

The system was unveiled this week alongside Microsoft’s disclosure of 16 newly discovered vulnerabilities affecting several versions of Windows. Among them were four critical remote code execution flaws that were addressed during this month’s Patch Tuesday updates. The announcement highlights Microsoft’s growing investment in AI-assisted security after years of criticism over recurring security weaknesses.

MDASH, short for “multi-model agentic scanning harness,” operates through a layered workflow. First, specialized AI agents scan software codebases for potential vulnerabilities. A second group of agents then evaluates and debates whether the discovered issues are genuine and exploitable. Finally, another stage generates proof-of-concept attacks to verify the flaws actually exist in real-world conditions.

This multi-agent approach differs significantly from Anthropic’s Mythos, which operates as a single AI model within an agent framework. Mythos gained attention earlier this year because of concerns surrounding its ability to autonomously discover and exploit software vulnerabilities. Anthropic limited access to the system through Project Glasswing, a cybersecurity consortium that also includes Microsoft.

OpenAI’s OpenAI GPT-5.5 and several other systems listed on the benchmark leaderboard also rely primarily on single-model architectures rather than collaborative multi-agent structures.

On the CyberGym benchmark, developed by researchers at University of California, Berkeley, MDASH achieved a score of 88.45%. The benchmark evaluates how effectively AI systems can reproduce real-world software vulnerabilities across 1,507 testing tasks taken from 188 open-source projects.

Anthropic’s Mythos Preview placed second with 83.1%, while GPT-5.5 followed closely behind at 81.8%. In the test environment, each AI system receives a description of a known vulnerability along with an unpatched codebase. The system must then produce a working exploit capable of triggering the bug.

Although the results are impressive, experts caution that the leaderboard scores are self-reported by participating companies, including Anthropic. While the CyberGym benchmark code is publicly available, no independent organization has yet verified the reported scores. Additionally, benchmark performance does not always reflect how these systems perform in unpredictable real-world environments.

The rise of systems like MDASH also intensifies concerns about AI becoming a powerful offensive hacking tool. The same capabilities that help security teams uncover vulnerabilities can also be misused by cybercriminals to identify exploitable weaknesses before patches are released. Microsoft stated that MDASH is currently being used internally by its security engineering teams and will soon enter a limited private preview for select customers.

Security experts believe AI will significantly accelerate vulnerability discovery, potentially leading to larger and more frequent Patch Tuesday updates in the future. Ben Seri, co-founder of cybersecurity startup Zafran Security, described the situation as an unavoidable technological shift where rapid vulnerability discovery could temporarily create instability before stronger defenses are established.

FAQS

What is MDASH?

MDASH is Microsoft’s AI-powered cybersecurity system that uses multiple AI agents and models to discover software vulnerabilities and verify exploitability.

How does MDASH differ from Mythos?

MDASH uses over 100 specialized AI agents working together, while Mythos relies on a single AI model operating inside an agent framework.

What is the CyberGym benchmark?

CyberGym is a cybersecurity benchmark created by UC Berkeley researchers to test how effectively AI systems can reproduce real-world software vulnerabilities.

What score did MDASH achieve?

MDASH scored 88.45% on the CyberGym benchmark, outperforming Mythos Preview and GPT-5.5.

Why are experts concerned about AI cybersecurity tools?

Experts worry that AI systems capable of finding vulnerabilities could also be used by hackers to discover and exploit security flaws before organizations can patch them.

Is MDASH publicly available?

Microsoft said MDASH is currently used internally and will enter a limited private preview for selected customers.

What are remote code execution vulnerabilities?

Remote code execution vulnerabilities allow attackers to run malicious code on a system remotely, often giving them unauthorized access or control.

Are the benchmark scores independently verified?

No. The CyberGym scores are self-reported by the participating companies, and no independent verification has been completed yet.

Conclusion

Microsoft’s MDASH represents a major advancement in AI-powered cybersecurity by demonstrating how collaborative multi-agent systems can outperform traditional single-model AI tools in vulnerability discovery. Its ability to identify and validate software flaws at scale could transform how companies secure software and respond to cyber threats. However, the technology also raises serious concerns about offensive misuse, as the same systems capable of protecting infrastructure could be exploited by attackers. As AI-driven cybersecurity tools continue to evolve, the industry faces the challenge of balancing innovation, security, and responsible deployment in an increasingly automated digital landscape.

What's Hot

Why software firms are calling time on the SaaSpocalypse

Sea’s View on the Future of Agentic Software Development with Codex

Google Says Criminal Hackers Used A.I. to Find a Major Software Flaw

Microsoft AI System Beats Anthropic Mythos in Cybersecurity Test

OpenAI Launches Daybreak as AI Firms Expand Into Cybersecurity

How Anthropic’s Mythos has rewritten Firefox’s approach to cybersecurity

Mythos Sparks Cybersecurity Panic as Experts Warn Threat Already Exists

AI and Humans Face Off in Cybersecurity Clash

Apple @ Work: How AI is going to change cybersecurity training for Mac admins

Nvidia’s Jensen Huang bets on this British startup to build ‘next frontier’ of AI

Microsoft AI System Beats Anthropic Mythos in Cybersecurity Test

How to overclock your GPU for a faster graphics card

AI and Humans Face Off in Cybersecurity Clash

How to Pitch Your Startup to Investors Successfully

WHO declares Ebola outbreak in DR Congo an international emergency

Reimagining the mouse pointer for the AI era

Latest Post

Why software firms are calling time on the SaaSpocalypse

Sea’s View on the Future of Agentic Software Development with Codex

Google Says Criminal Hackers Used A.I. to Find a Major Software Flaw

Subscribe to Updates

What's Hot

Microsoft AI System Beats Anthropic Mythos in Cybersecurity Test

FAQS

What is MDASH?

How does MDASH differ from Mythos?

What is the CyberGym benchmark?

What score did MDASH achieve?

Why are experts concerned about AI cybersecurity tools?

Is MDASH publicly available?

What are remote code execution vulnerabilities?

Are the benchmark scores independently verified?

Conclusion

Related Posts