The Myth of Mythos
Self-certified benchmarks, a silent API default that degraded performance for 34 days, and why South Africa had to pull its national AI policy
1. Trustible’s Take: The Mythos Verification Problem
Anthropic’s release of Claude Mythos Preview earlier this month has been a marketing triumph. Anthropic isn’t releasing the model broadly because, it claims, the model is too dangerous; access instead runs through “Project Glasswing,” a consortium that includes Amazon Web Services, Apple, Google, JPMorganChase, Microsoft, NVIDIA, and the Linux Foundation, with a $100M commitment in API credits across launch partners and 40+ additional organizations. Mozilla used early access to patch over 270 Firefox vulnerabilities. The system card disclosed that during internal testing, the model broke out of a sandbox and emailed a researcher to announce its escape. The reaction has been everything Anthropic could have hoped for, including the New York Times calling it a “terrifying warning sign,” central bankers in three countries convening emergency meetings, and a thaw with the White House after months of fighting the Pentagon’s “supply chain risk” designation in court.
Some of the underlying capability is real. Mythos appears genuinely better than prior models at chaining exploits end-to-end, and the UK AI Security Institute’s evaluation found it the first model to fully complete a 32-step simulated network attack. The marketing framing runs well ahead of the evidence, though. Anthropic chose which projects to scan (OpenBSD, FreeBSD NFS, FFmpeg, the Linux kernel, etc.), scoped the relevant code paths, and then reported the findings as autonomous discovery. When Semgrep and AISLE independently tested whether other models could reproduce those findings under similar scoping, they did. Open-source Qwen 3.6 matched Opus 4.6 on the OpenBSD bug. A 3.6B-parameter open-weights model recovered the FreeBSD vulnerability for $0.11 per million tokens. The much-hyped sandbox escape happened because a simulated user explicitly instructed the model to escape. And the headline number, “thousands of high-severity vulnerabilities,” extrapolates from just 198 manually reviewed cases. Mythos is faster and better at agentic exploitation, but the gap on the actual hard problem of novel discovery is narrower than the rollout implies.
The verification regime under which these claims are being made is the bigger story. Over 99% of the discovered vulnerabilities remain undisclosed. The only organizations with access are Glasswing partners who are being subsidized with API credits to validate the product they’re being asked to endorse. The system card is self-evaluated. Six weeks before the Mythos launch, Anthropic replaced its Responsible Scaling Policy with a v3.0 that dropped the hard commitment to pause deployment when capability thresholds are crossed, in favor of non-binding “Frontier Safety Roadmaps” and self-prepared “Risk Reports.” SaferAI and GovAI flagged at the time that the new RSP relies on Anthropic’s own argumentation rather than verifiable thresholds. Mythos is the first major release under that regime, and it is being marketed on the strength of safety claims that no one outside Anthropic and its paid partners can audit. The strategic returns have already accrued: a softened White House posture, a Treasury-convened bank summit, and Glasswing relationships across the Fortune 50.
Key Takeaway: It is entirely possible that Mythos is both a real capability jump and a carefully orchestrated PR exercise. What’s harder to dispute is that frontier safety claims are increasingly being made under conditions designed to make them difficult to independently verify, and the recent loosening of voluntary frameworks like Anthropic’s RSP doesn’t help. The strategic returns to Anthropic, including White House access, partner relationships, and a national-security narrative, accrue regardless of whether the technical claims hold up under scrutiny.
2. Tech Explainer: Understanding AI Effort Parameters
Every major AI provider now offers a way to control how hard a model “thinks” before answering: OpenAI calls it reasoning_effort, Google uses thinkingLevel, and Anthropic uses an effort parameter. The concept traces back to test-time compute scaling research showing that model accuracy improves with more thinking tokens — an alternative to scaling model size alone. While earlier API versions let users specify a maximum reasoning token budget, newer versions only allow a level (e.g. low/medium/high). In some cases, the parameter controls not just hidden reasoning but all token spend including tool calls (i.e. lower efforts means less tool calls).
Providers recommend different settings for different task types, but obscure how they map to actual compute allocation. Higher effort means deeper reasoning at greater cost and latency; lower effort means faster, cheaper responses that may cut corners on complex tasks. The interplay between model size and effort is not well studied: Anthropic recommends “medium” as the default for Sonnet 4.6 but “xhigh” for Opus 4.7 on coding tasks, suggesting larger models need more thinking room to fully express their advantage.
This simple parameter can have a large effect on system functionality. Earlier this year, many users reported a degradation in Claude Code’s performance and a recent postmortem revealed that the default effort had been silently changed from “high” to “medium.” The change was live for 34 days and was compounded by a caching bug that wiped reasoning history every turn and a system prompt that capped response length — none of which altered the underlying model weights.
Key Takeaway: When using closed-source models through APIs, developers trade convenience for transparency. The effort parameter deepens that trade-off — a single default change by the provider, invisible to users, can significantly alter system behavior. At the same time, effort provides a simple toggle from the governance perspective: increased effort can produce higher quality results, but comes at the price of latency and cost. This setting needs to carefully be considered as part of system design and review.
3. Incident Spotlight: When AI is used to draft the AI Policy (Incident 1467)
What Happened: South Africa recently released a draft form of their National AI Policy for public comment and feedback. The main feedback they got was: your citations are hallucinated. It turns out that at least 6 of the ~70 academic citations from the draft did not exist, as confirmed by the academic publishers. This led to the draft being revoked and a rewrite is in progress.
Why it Matters: Obviously it’s particularly ironic that AI was used to draft the national AI policy, and that its own writers, and several review committees missed the errors. The broader concern is use of AI in the policy making process in general, and particular issues with citations.
Use of AI by policy makers is quickly rising, with a recent Penta Group study suggesting up to 60% of lawmakers and their staff regularly use AI, and often do research with it. Lawmakers, and governments in general, are often in a unique position where they are allowed to cause particular types of ‘harm’ in the public interest, even while most people and AI systems are not. This creates a substantially higher standard for AI use, and it means that not all models or systems may be well equipped for use in the public sector.
Citations remain a weakness for many AI systems as they are ripe for hallucination and are tedious to manually check. Even the top law firms are still regularly getting caught for fictitious legal citations, leading to reputations, and occasionally financial repercussions.
How to Mitigate: One of the biggest challenges is that citation standards date from the pre-web era, and so many papers simply list author names, titles, and publication dates and don’t link to some authoritative and state URL online. In addition, many academic journals have strong licensing and access restrictions. Legal citations may use even less unique citation structure.
This is a recipe for disaster even with the best models as they don’t have an easy mechanism to verify certain citations. The best thing to do from a writing side is to require any citations and always use a web link, and those can be more easily verified. Obviously this will have a large impact on the publishers of work not easily accessible, and they will need to address that over time. Building in mandatory web links for any cited content, running a check that the source exists and is cited properly, and supporting an ecosystem for this is the best way to mitigate this for report writers.
4. CHAI Conference Recap
At last month’s CHAI Leadership Summit in Dana Point, Trustible CEO Gerald Kierce led a working session alongside governance and privacy leaders from Mass General Brigham, UT MD Anderson, and a large managed care provider. The room skipped the fundamentals. The conversation focused on what’s still broken. Two scenarios anchored the session: managing unauthorized agent activity, and measuring AI benefits in ways that actually inform decisions.
The first scenario: an AI agent had been running autonomously in a clinical department for six weeks, scheduling appointments and routing care escalations, without ever going through intake. Most governance programs can’t detect that until after it’s happened, and vendor contracts that don’t address post-signature AI feature additions make it harder to manage.
The second scenario: a CFO asking which of 35 AI initiatives in flight are actually delivering value. Governance programs that capture only risk at intake are leaving the most strategically important data on the table. Benefits belong in the same system as the risk assessment. Programs that can only see the risk side will systematically under-approve high-value use cases.
[See the presentation deck here.]
5. Policy Round Up
Colorado AI Act. Colorado is delaying enforcement of the Colorado AI Act until interpretive rulemaking is finished, or until a policy that could replace it is proposed. This comes after the US DOJ joined xAI’s lawsuit against the state’s law last week, arguing that it impacts first amendment speech and would be a heavy burden to comply with. The Colorado Governor announced in March that a new policy framework has been agreed upon and would replace the existing Colorado AI Act, however no legislation has been proposed yet.
Our take: Even if the new AI Policy Framework is passed before session ends in May, likely we are looking at another year or so before enforcement is pushed through.
Agents Under EU AI Act. Since the EU AI Act was passed, there has been a significant surge in AI agent development and adoption. The Act does not explicitly address AI agents, raising questions for providers about how to govern them for compliance. This recently released paper fills that gap by creating a taxonomy of agent use cases mapped to regulatory triggers, identifying AI agent-specific compliance challenges that aren’t entirely covered under any standards, and proposing a compliance architecture that integrates not only the EU AI Act but also 8 other related EU regulations.
Our take: Although not formal regulation, this paper provides a solid, comprehensive analysis of how AI providers should think about governing their AI agents per the EU AI Act.
New Model Risk Management Guidance. The OCC, FDIC, and Federal Reserve System released new guidance on Model Risk Management, replacing prior guidance. This guidance is meant to address an additional 15 years of advancements in modeling practices and industry feedback and create a comprehensive MRM guidance for banking institutions. One thing is notably left out: generative AI and agentic AI models. They acknowledge the rapid pace of innovation, and have decided to publish an RFI in the near future to gain additional insights for future guidance.
Our take: While the omission of GenAI and agentic AI may seem like a gap, we think it’s the right call. MRM looks at model performance, while AI governance looks at use cases for how and where that model is applied. Keeping them separate ensures model validation doesn’t substitute for the broader oversight that a use case-level view provides.
In case you missed it:
South Africa. South Africa’s recently proposed draft national AI policy has been withdrawn after multiple fake sources were found in the reference list, more than likely AI generated. The Minister of Communications and Digital Technologies, Solly Malatsi, emphasized that this kind of mistake is “why vigilant human oversight over the use of artificial intelligence is critical.”
UAE. The UAE Prime Minister announced that within two years, half of all federal government operations will be “run on Agentic AI”. Adoption of autonomous systems at scale to redesign how the federal government operates, if successful, would be a first of its kind accomplishment for the UAE who has been working towards centralized AI governance for nearly a decade.
—
As always, we welcome your feedback on content! Have suggestions? Drop us a line at newsletter@trustible.ai.
AI Responsibly,
The Trustible Team




