Trustible AI Governance Newsletter #44: Model Swaps & Data Trust
Plus what makes for a perfect use case intake process, and our global policy roundup
Happy Wednesday, and welcome to the latest edition of the Trustible Newsletter! It’s been a busy week in the AI regulatory landscape, as California enacts new legislation targeted towards model builders, and the E.U. continues to ponder the next steps for AI Act enforcement (we wrote about the pros and cons of this on our blog.)
In today’s edition (5-6 minute read):
Model Swap
Good Models Still Require Credible Data
What Makes for the “Perfect” AI Use Case Intake Process?
Global & U.S. Policy Roundup
1. Model Swap
OpenAI and Anthropic recently ran a cross-evaluation of each other’s models. Both highlighted different risks, and demonstrated how task framing shapes evaluation outcomes.
OpenAI leaned on instruction-following and jailbreak resilience as core safety markers, noting for instance that Claude refused up to 70% of hallucination probes rather than risk giving a wrong answer. Anthropic emphasized long-horizon misuse and sycophancy, finding that OpenAI’s o3 resisted harmful misuse better than GPT-4o, GPT-4.1, or o4-mini, which were often willing to cooperate with simulated bioweapon or drug synthesis requests.
While some research labs like Epoch AI, and government sponsored institutes like the UK’s AI Security Institute (AISI) have done independent model evaluations, this was the first instance of the 2 frontier model leaders conducting this kind of evaluation swap. The differences in evaluation approaches, and respective strengths/weaknesses of each other’s models highlights how the philosophies, and incentives, of the model providers can be reflected in their models. It also highlights the need to have a wide range of evaluations, and now just rely on self reporting. The evaluations focused on ‘frontier capability’ assessments however, not assessments on more ‘day to day’ AI tasks. OpenAI more recently introduced GDPEval to assess model performance on industry specific tasks, and this type of benchmark could quickly become an industry standard relevant to AI deployers and users.
Key Takeaway: Evaluations that just rely on a single number, can easily mask some of the more nuanced differences between foundation models. Once you dig into the details, the priorities, culture, and incentives of a model provider may be more clear, and may be an issue for certain types of use cases.
2. Good Models Still Require Credible Data
A recent study showed that ChatGPT may return answers based on claims in redacted scientific studies, even when information about the redaction appears in the same document. Unlike hallucinations, where the model fabricates a claim, this study points to the models inability to properly contextualize information in the training data and recognize when it’s inaccurate. This challenge can’t be solved easily: first, the pre-training process breaks up the documents into multiple pieces, thus a retraction statement may have been associated with the title of the paper, but not with individual statements in another part of the paper. Second, modem LLMs are trained on a vast corporas of data that are not manually reviewed, thus removing all erroneous information from pre-training data isn’t feasible (even if it was - this process would be biased and subjective).
While this study focused on GPT 4o-mini’s internal knowledge, many modern systems integrate external information through web searches or connection to internal knowledge bases (i.e. RAG). These integrations can help the model analyze the document as a whole and consider retractions published on external platforms. We tested a couple example claims from the study and found that during a web search the model recognized the retraction and corrected its original answer.. However, this process still relies on the model having access exclusively to up-to-date information; an out-of-date document that isn’t annotated as such can cause similar erroneous assertions. This failure mode may have been behind Air Canada’s chatbot stating false advice on bereavement travel last year. Our own team at Trustible has encountered similar challenges using AI for coding tasks, where old, outdated code (tech debt), regularly confuses the model.
Key Takeaways: LLMs can not reliably identify if information is up-to-date, especially if updates or retractions are not clearly associated with a specific fact. Practitioners should take care to maintain accurate data sources for training, fine-tuning and RAG. In addition, detailed prompting can help the system explicitly check for potential inconsistencies.
3. What Makes the “Perfect” AI Use Case Intake Process?
AI intake is the front door to governance. In a panel at the IAPP AI Governance Global North America in Boston a few weeks ago, Trustable led a discussion with leaders from Leidos and Nuix on AI use case intake processes. The room quickly aligned on a core truth: there’s no “perfect” intake process, only the one that fits your org’s risk profile, scale, and speed. The real work is choosing trade offs you can live with.
The conversation unpacked six design levers teams should tune, not max out:
Granularity: are you tracking whole use cases, or features within products?
Heaviness: what’s the minimum set of questions that still surfaces risk?
Outcomes: does intake simply route, or also drive mitigations and decisions?
Participation: who owns what—privacy, legal, security, product, HR, IT?
Implementation: start with forms/spreadsheets, but plan for workflow and automation.
Timing: catch ideas early without slowing experimentation.
Practitioners shared pragmatic moves: start where you are (even if it’s messy), iterate fast, and reframe intake from “audit” to “risk reduction.” Build muscle memory with short cycles and clear handoffs. As volume grows, expect ad hoc docs to buckle; that’s your signal to standardize fields, centralize your inventory, and automate routing so triage and transparency don’t degrade. Perhaps most importantly, make intake a shared habit—invite cross-functional partners in before the first pilot, not after the first incident. Culture change is the glue that keeps the process from backsliding.
Key Takeaway: fit beats perfection. A right-sized intake gives leaders visibility, lets teams move quickly with guardrails, and creates artifacts that stand up to regulatory and stakeholder scrutiny.
You can read the full recap on our blog.
4. Global & U.S. Policy Roundup
Trustible’s Top AI Policy Stories
OSTP Request for Information. As part of the Trump Administration’s AI Action Plan, OSTP is seeking input on existing rules that may hinder AI deployment or adoption.
Our Take: Beyond commenting on rules that industry does not like, this proceeding will allow companies to identify redundancies among existing rules that can help streamline AI governance processes should the federal government tweak them.
Anthropic settlement approved. Judge Alsup approved the $1.5 billion settlement agreement between Anthropic and authors who say the company infringed on their copyrights.
Our Take: The copyright cases continue to highlight why companies need to know where data used in their AI tools come from and they have the requisite permission to use IP in those tools.
United Nations AI Dialogue. The UN announced the “global dialogue on AI governance,” which would create a panel of experts to study best practices for AI governance.
Our Take: The UN is trying to stake its claim on AI standards but the final recommendations will likely remain at a high-level, making it difficult to operationalize at the enterprise level.
AI Bills in California. Governor Gavin Newsom signed SB 53 into law, which is a watered down version of last year’s SB 1047.
Our Take: The new law imposes new governance obligations for frontier model providers with some limited downstream impacts.
In case you missed it, here are additional AI policy developments:
Asia. AI-related policy developments in Asia include:
China. Deepseek launched DeepSeek-V3.2-Exp, a new model billed as an “intermediate step toward [their] next-generation architecture.” The new model is likely to put new pressure on other Chinese model providers like Alibaba, which launched their new Qwen 3 Max model earlier this month.
Japan. The Ministry of Defense is taking a hard line on AI in the military, allowing it to help with defense operations but maintaining that humans must be in charge of lethal force decisions. Japan’s new policy comes as it formalized the SAMURAI Project with the U.S., which is intended to advance AI safety in unmanned aerial vehicles.
India. The government signed a new partnership agreement with Venezuela to “jointly explore the integration of [AI] and digital public infrastructure in sectors such as health, payments, and education.”
Australia. The Digital Economy Minister reiterated that Australia will take a light touch approach to AI regulation. The Australian government was interested in a more comprehensive approach but have since abandoned that effort.
Europe. AI-related policy developments in the Europe include:
EU. The European Commission (EC) may pause implementing the AI Act, after strongly rejecting the idea back in July. The potential pause will be discussed at an upcoming AI Board meeting in October primarily because of implementation delays at the national level. The EC also published draft guidelines for reporting serious incidents as required under the AI Act.
Italy. The Italian government passed a new AI law that criminalizes certain uses of AI, such as creating deepfakes or assisting with committing crimes. The law also requires that children under the age of 14 get consent from their parents to access AI.
Middle East.
UAE. Sam Altman met with President Sheikh Mohamed bin Zayed Al Nahyan to discuss how to foster closer cooperation on AI.
Saudi Arabia. Representatives from South Korea met with officials in Saudi Arabia to discuss closer collaboration on building more innovative environments for SMEs and expanding new market opportunities.
North America. AI-related policy developments in outside of the U.S. in North America include:
Canada. The AI and Digital Innovation Minister announced a new AI Task Force that will work over the next 30 days to make recommendations for Canada’s national AI strategy. It is unclear whether these recommendations will turn into actual regulations.
Mexico. CloudHQ announced a $4.8 billion data center in a state just north of Mexico City.
—
As always, we welcome your feedback on content! Have suggestions? Drop us a line at newsletter@trustible.ai.
AI Responsibly,
- Trustible Team