Transatlantic AI Uncertainty
Plus, a deeper look at one of the first confirmed cyber attacks by AI agents, the challenges of open weight models, and our global policy roundup
Happy Wednesday, and welcome to this edition of the Trustible AI Newsletter! Two weeks is an eternity in AI world, and in the past two weeks, we’ve seen a seismic shift in the AI regulatory environment both here in the U.S. and across the pond in the EU (more on that later in this edition.)
But, it’s also been a big couple of weeks for all of us here at Trustible; last week, we launched our new Trustible AI Governance Insights Center, an open source repository of our AI governance heuristics, from our risks taxonomy, to recommended mitigation strategies, AI benefits, and even more developments in our AI model ratings curated by our team of experts. Over time, we’ll be adding even more insights and resources, but as a public benefit corporation, this is an important step in advancing our mission to help society realize the transformative potential of AI. You can explore the insights center at trustible.ai/resource-center.
We are also thrilled to share that we’ve been listed as a Representative Vendor in the 2025 Gartner® Market Guide for AI Governance Platforms. We believe this is a milestone that signals the start of an inflection point, when AI governance is no longer optional, experimental, or theoretical; it’s now a business imperative for enterprises looking to realize the promise of AI. You can read all about the exciting news here.
In this week’s edition, we’re covering:
Trustible’s Take - Transatlantic AI Uncertainty
AI Incident Spotlight - Cyber Attacks by AI Agents
Technical Explainer: Big challenges with open-weight models
Trustible’s Top AI Policy Stories
1. Trustible’s Take - Transatlantic AI Uncertainty
As a result of a flurry of regulatory proposals last week, there is now more uncertainty than ever about when AI regulations may kick-in and what those regulations will be. In Europe, the EU Commission proposed a ‘Digital Omnibus’ to reform several digital laws, including GDPR and the EU AI Act. The Commission is proposing a delay for high risk AI system obligations for a period ranging between 12 and 24 months. A driving force behind the delay comes from issues developing compliance standards for high risk systems. The Commission’s proposed delay stipulates that if standards are developed before the new deadlines hit, then high risk system requirements will take effect sooner. However, it is unclear if the Commission’s proposal will make it past a skeptical EU Parliament (which has to agree) or if the compromise text would pass before the current deadline for high risk obligations comes into effect in August 2026.
Meanwhile in the US, congressional Republicans are moving forward with plans to resurrect the state AI moratorium by attaching it to the annual National Defense Authorization Act. President Trump fully supports the effort moratorium and is considering an Executive Order (EO) that would effectively enact a moratorium without Congress, but the planned EO appears to be on hold. A federal moratorium on state AI laws is facing backlash from several prominent GOP elected officials, including Governor DeSantis and Senator Hawley. Any moratorium without an actual superseding federal law would likely face immediate lawsuits, especially if done via an EO.
Ironically, the recent policy prerogatives in the EU and US aim to incentivize AI growth and adoption but instead have injected such a level of regulatory uncertainty as to undermine these goals. The EU and US have substantially lower rates of trust in AI than any other part of the world. Solving the AI trust problem is the key to growing AI adoption rates on both sides of the Atlantic.
Key Takeaway: The increasing regulatory uncertainty paired with the perception that AI is unregulated is not going to help accelerate adoption. Backtracking on efforts that improve trust and adoption could just cause the “AI bubble” to burst, along with any potential to “win” the AI race against China.
2. AI Incident Spotlight - Cyber Attacks by AI Agents (Incident 1263)
What Happened: Anthropic identified that a Chinese hacking group (GTG-1002) used its Claude Code platform to launch a fully autonomous cyber espionage attack against over 30 targets. The groups used several ‘jailbreaking’ techniques to evade protections built into Claude Code. The attack notably included autonomous ‘multi-step’ processes, where the agent first scanned for the target’s cloud resources, programmatically identified vulnerabilities, created exploits for them, and then successfully extracted data before being shut down by Anthropic.
Why it Matters: There are several highly notable and severe aspects of this incident. The first is that it highlights a new dangerous paradigm for cybersecurity. The fact that an AI agent was able to successfully conduct a highly sophisticated multi-step attack, with limited human interaction confirms the fears of many in the cybersecurity world. The fact that it was a Chinese group, with links to the Chinese state, using an American AI system is also likely to create a massive reaction in DC. This is also one of the few voluntary first-party incidents reported by a large model creator, although it’s unclear what their longer term mitigation efforts may be beyond simply trying to ban the relevant parties.
How to Mitigate: A lot of focus of these incidents is often on the underlying ‘model’. However LLMs only ever generate text, and even in agentic systems, they’re often simply generating text that instructs a system to do something, or to call some kind of external tool. The real dangers here came from all the other ‘system’ components built into Claude Code as a ‘platform. Claude-4 the model can generate the text command to make an HTTP request, but Claude Code the platform can actually make the request. Claude Code then also was storing information in memory, running a process over several minutes (or hours), and executing generated computer code. For organizations hosting AI systems, limiting how much a system can independently access the internet, or run code, can heavily restrict the potential blast radius and capabilities of a system. For those worried about similar incoming cybersecurity attacks, tools like CloudFlare anti-AI bot tool will become an essential layer to help block sophisticated non-human interactions.
3. Technical Explainer: Big challenges with open-weight models
Open-weight models, like Phi and Qwen, are available for download and unconstrained use by researchers, businesses, and consumers. Unlike closed-source models (e.g., GPT or Gemini) that use input/output filters, protections in open-weight models must be integrated directly or added by deployers. Recent research highlights risk management challenges that are particularly salient for open-weight models.
Training Data Curation: One key mitigation strategy is to avoid training on unsafe sources. However, current filtering methodologies are inconsistent and some “knowledge” may be useful for benign capabilities (e.g. designing cybersecurity defences). In addition, recent work suggests that harmful knowledge may emerge from a combination of benign data sources (e.g. knowledge of biorisks may be inferred from general knowledge of biology). Choosing an appropriate curation strategy may be challenging for deployers, and a lack of clear guidelines can make it difficult for developers to review models.
Tamper-Resistant Training: Post-training methodologies can further reduce unsafe behavior, however, downstream modifications can intentionally or unintentionally remove these protections. In addition, as models become integrated into agentic systems, they may retrieve harmful knowledge from the internet, introducing a new attack vector.
Model Tampering Evaluation: Open-weight developers often lack the resources for extensive audits, shifting the burden to deployers. Furthermore, standardized frameworks don’t exist for these evaluations making it difficult to compare and trust different models. (In our own AI Model Ratings, we’ve observed that many open-weight developers explicitly disclose a lack of adversarial evaluations)
Model Provenance: Both researchers and deployers may want to study the lineage of specific open-weight models. While the former may want to understand the broader ecosystem, the latter needs to know whether a particular model is appropriate for an application. Currently, no reliable and scalable approach exists for tackling this challenge.
Key Takeaway: The challenges outlined are exacerbated by a lack of transparency - most popular open-weight models largely do not address these topics in their documentation. While these models allow for more flexibility and control, they shift the burden to deployers to investigate model provenance, run safety evaluations and build additional safeguards while considering a lack of unified standards and solutions for each step.
P.S. While this work emphasizes the lack of reporting around safety, we’ve recently collaborated with the EvalEval coalition on a new paper that shows that societal impact evaluations are underreported across the LLM landscape.
4. Trustible’s Top AI Policy Stories
House Hearing on Chatbots. The House Subcommittee on Oversight and Investigations held a hearing to better understand the risks posed by AI chatbots, particularly to minors, and hear recommendations from experts on potential regulatory solutions.
Our Take: Congress continues to be lasered focused on AI harms posed to children and may be one of the few AI issues that sees bipartisan legislation pass.
National Security Framework. A bipartisan group of lawmakers in the House and Senate are working on legislation that would require the National Security Agency to publish an AI security playbook, which is intended to help outline how AI systems are being protected from foreign adversarial threats.
Our Take: AI security is a priority for the Trump Administration (as emphasized in the White AI Action Plan) and represents another area where lawmakers could plausibly pass legislation.
United Nations and Healthcare. A new report from the World Health Organization warns that AI used in healthcare settings needs legal guardrails to protect patients and healthcare professionals.
Our Take: The concerns are not new but the report notably observes “there is a broad consensus on the policy measures” that could improve AI adoption.
In case you missed it, here are a few additional AI policy developments making the rounds:
Africa. The UAE announced the “AI for development initiative,” which will invest $1 billion to expand AI infrastructure in Africa. While US tech companies have been making active investments to help African countries develop AI technology and infrastructure, the US government has largely been absent.
Asia. AI-related policy developments in Asia include:
China. The Chinese government issued guidance that would ban foreign-made AI chips from new data center projects. The decision comes amidst tensions with the US over advanced chips sales in China.
South Korea. Korea’s Ministry of Science and ICT released draft regulations for the AI Basic Act, the country’s comprehensive AI law passed in December 2024. The proposed regulations clarify how covered entities will need to comply with the law, which takes effect in January 2026.
Australia. The Australian government released the Australian Public Sector (APS) AI Plan, which is aimed at improving how AI is used by public sector agencies. While the Australian government has backed away from enacting AI rules for private companies, the ASP AI Plan could have far-reaching impacts on companies doing business with the federal government.
Middle East. The Trump Administration signed a new Memorandum of Understanding with the Saudi Arabian government, which would allow the Saudi government to access US AI technology. The agreement is part of the Trump Administration’s broader AI policy goals with Middle Eastern countries.
South America. The UN’s COP30 climate conference met in Brazil and focused heavily on how AI is impacting climate change. While attendees acknowledged AI’s potential for helping address climate change, many raised concerns over AI’s strain on natural resources and energy consumption.
As always, we welcome your feedback on content! Have suggestions? Drop us a line at newsletter@trustible.ai.
AI Responsibly,
- Trustible Team





