AI Makes Work More Intense
Also, why repeating your prompt can improve accuracy, why context from the physical world is essential in healthcare AI, Trustible partnership announcement, and policy updates
AI Tools Are Making Employees Work More, Not Less
Technical Explainer: Prompt Repetition
Trustible Joins Coalition for Health AI
AI Incident Spotlight - When AI Alerts Lack Sufficient Context
Policy Round-up
(Source: Google Gemini)
1. AI Tools Are Making Employees Work More, Not Less
The sales pitch for enterprise AI adoption goes something like this: AI handles the tedious stuff, your employees focus on higher-value work, everyone’s happier and more productive. New research described in the Harvard Business Review suggests the reality is closer to the opposite. In an eight-month study of a 200-person tech company, researchers found that generative AI tools didn’t reduce workloads. They intensified them across three dimensions: employees expanded into tasks outside their roles (designers writing code, PMs debugging), work bled into breaks and off-hours as the low friction of “one more prompt” eroded boundaries, and constant multitasking across parallel AI workflows created persistent cognitive load. None of this was mandated. Workers did it voluntarily because AI made “doing more” feel accessible and even enjoyable.
For AI governance professionals, this matters because it complicates the risk calculus around enterprise AI deployment. The obvious governance concerns with AI tools, things like data leakage, IP exposure, and accuracy, are well understood. But work intensification introduces a quieter set of risks that most AI governance frameworks don’t account for. Employees operating outside their core competencies with AI assistance means more AI-generated or AI-assisted outputs flowing through an organization with less qualified review. The researcher’s finding that engineers spent increasing time correcting “vibe-coded” pull requests from non-engineering colleagues is a concrete example of how quality control can quietly degrade. Meanwhile, the burnout cycle the study describes, where early productivity gains give way to cognitive fatigue and lower decision quality, suggests that organizations measuring AI’s impact purely through short-term output metrics are likely overstating the long term benefits.
Key Takeaway: Reviewing AI outputs across multiple tasks may be cognitively burning teams out, and expectations for productivity and outcomes from executives are rising. AI use policies may need to adapt to include acceptable patterns for taking a break from AI, and understand the cognitive impacts of rapid context switching and constant reviewing of AI outputs.
2. Technical Insight: Prompt Repetition
Google researchers recently published a paper showing that simply copying and pasting a prompt twice into the input, with no other changes, consistently improves LLM accuracy across Gemini, GPT, Claude, and Deepseek models. The technique, called “prompt repetition,” won 47 out of 70 benchmark tests with zero losses, added no meaningful latency, and didn’t change the length or format of the model’s output. Because LLMs process tokens left to right, early tokens in a prompt can’t “see” later ones. Repeating the prompt gives every token a second pass where it can attend to the full context. There’s also a simpler intuition at play: repetition likely strengthens the internal representations of the input, effectively increasing the “weight” the model assigns to the prompt’s content relative to its prior training biases. The gains are modest on standard benchmarks but dramatic on tasks requiring attention to information buried in long inputs, exactly the kind of problem that plagues document-heavy enterprise workloads.
For governance teams, the more interesting implication is what this says about evaluation. If a trivial input transformation can meaningfully shift benchmark scores, it raises questions about how stable published model evaluations really are. Two organizations testing the same model with slightly different prompt formats could reach very different conclusions about its reliability. Prompt repetition works best when reasoning mode is off, which is how most enterprise API calls operate for classification, extraction, and structured output tasks, meaning it’s a real and essentially free improvement. But its bigger lesson is that small methodological choices in evaluation can have outsized effects on results.
Key Takeaway: Prompt repetition is worth testing for non-reasoning API workloads, but governance teams should treat it as a reminder that model evaluations are more fragile than published results suggest, and should account for prompt sensitivity when comparing models or setting performance thresholds.
3. Trustible Joins the Coalition for Health AI
We’ve partnered with the Coalition for Health AI (CHAI) to bring CHAI’s AI Governance Framework directly into the Trustible platform. Healthcare organizations can now map their AI governance activities to CHAI’s guidance with structured workflows, healthcare-specific risk assessments, and audit-ready documentation. For health systems trying to move from AI ambition to confident deployment, this removes the need to build governance practices from scratch or adapt generic frameworks that miss healthcare’s context. Read more about the partnership here.
4. AI Incident Spotlight - When AI Alerts Lack Sufficient Context (Incident 1374)
What Happened: A nurse at a Nevada hospital described an episode in which the facility’s AI sepsis alert system flagged an elderly patient with low blood pressure and triggered urgent protocol steps, including IV fluids. The nurse noticed the patient had a dialysis catheter, meaning her kidneys were already compromised. Pumping IV fluids into a patient who can’t process them risks dangerous fluid overload. When the nurse objected, he was told to proceed anyway because the AI had generated the alert. He refused, and a physician ultimately intervened with an alternative treatment that avoided the risk. The incident, reported by Scientific American in February, is part of a broader pattern of clinical AI systems generating recommendations that conflict with what’s observable at the bedside.
Why it Matters: The model was working as designed. It detected signals consistent with sepsis and triggered the correct protocol. The problem is that it had no way to know about the dialysis catheter, a piece of real-world physical context visible to any clinician in the room but absent from the electronic health record the model was reading. This is a recurring blind spot: clinical AI systems operate on structured digital data, but a significant share of relevant information only exists at the bedside. What the alert did next is the governance concern. It created institutional momentum. Protocol kicked in, and the nurse’s clinical objection was initially treated as non-compliance. The AI didn’t just inform the decision, it set the default, and overriding it required escalation.
How to Mitigate: Organizations deploying clinical AI alerts need workflows where alerts inform rather than direct. That means building override mechanisms that don’t require escalation and ensuring frontline staff have clear authority to act on evidence that contradicts a model’s output. This incident also highlights the limits of models that rely solely on digitized records. Physical context, like a dialysis catheter, is exactly the kind of information that’s hard for a model to ingest. Until that gap closes, human review is the only reliable way to catch what the system can’t see.
Key Takeaway: An AI system is only as good as its inputs, which includes any and all relevant context. Missing context for a system is one of the biggest potential sources of error and one that often can be missed in many of the ‘evaluations’ that are done to a system before deployment. Knowing what information an AI system can access, and what it’s input limits are is an essential part of governance.
5. Policy Roundup
DOD and Anthropic. Anthropic has been in a heated battle with the Department of Defense (DoD) over the use of their models. The DoD has put pressure on Anthropic to drop safeguards and allow for their models to be used from a wider range of military purposes.
Our Take: While AI safety has been prioritized under the Trump Administration, forcing a model provider to lower safeguards will have ripple effects across the ecosystem and may worsen the trust gap with AI.
Utah’s own RAISE Act. Lawmakers in Utah have been quietly working on passing a law that is very similar to California’s SB 53 and New York’s RAISE Act. The bill is currently making its way through the state house. One key difference is an added requirement for model providers to develop a child safety plan for their models.
Our Take: The Trump Administration has sought to deter state action on AI, but concerns with AI safety, as well as its impacts on jobs and children, have been relatively bipartisan.
India AI Summit. India recently held the latest global AI summit, which is a continuation of the work started in Paris last year and the UK in 2024. The focus was mainly on AI innovation and attendees scolded the EU for its prescriptive approach to AI oversight.
Our Take: The shift in global attitudes towards AI regulations has been swift and the focus of these global AI summits showcases that dramatic turnabout.
In case you missed it, here are a few additional AI policy developments making the rounds:
Asia. The newly elected Japanese government will host a ministerial meeting to discuss how AI can be effectively utilized by the government. In Korea, the National AI Strategy Committee adopted a final AI Action Plan at its second plenary session.
Australia. The Australian government is scrapping its AI Advisory Board after launching it 15 months ago. The Advisory was charged with setting out recommendations for AI safeguards.
—
As always, we welcome your feedback on content! Have suggestions? Drop us a line at newsletter@trustible.ai.
AI Responsibly,
- Trustible Team


