The AI Age-Gate Conundrum
Plus the role of Guardrail Models, breaking down a deepfake endorsement scam, and our global AI policy roundup
Happy Wednesday, and welcome to the latest edition of the Trustible Newsletter! Last week, the Trustible team travelled to sunny San Jose for the GovAI Coalition Summit, where state and local leaders met to discuss how to accelerate safe and responsible AI adoption within states, cities, counties, and municipalities across the country - and we successfully made it home (after a few delays.)
In this week’s edition, we’re covering:
Chatbot Teetertot: Balancing Child Safety & AI Literacy
Tech Explainer: The Role of Guardrail Models in AI Systems
AI Incident Spotlight: Deep-Fake Endorsement Scam (Incident 1261)
Policy Round-Up
1. Chatbot Teetertot: Balancing Child Safety & AI Literacy
Minors are increasingly using AI tools for social support and companionship. However, it’s not difficult for kids to elicit problematic or harmful content from AI tools, mainly AI chatbots and companion bots. The most notable examples are from Character.ai, whose companion bots have lead to children committing self-harm or suicide, as well as sharing or generating illicit content. Character.ai has since blocked kids from accessing its chatbots.
Industry is trying to figure this out, with OpenAI’s Teen Safety Blueprint as one example of how the private sector can offer solutions. Policymakers in the US have also taken notice and are attempting to enact new laws that address these issues. For instance, a new California law requires companies to implement safeguards to prevent companion chatbots from discussing suicide and must direct the user to self-harm resources if they express suicidal thoughts to the bot. Congress has also taken notice and the bipartisan GUARD Act was introduced in the Senate, which would (among other things) ban minors from accessing companion bots.
Understandably, lawmakers and parents want to protect kids from the harms posed by chatbots. Age verification requirements have long been used to prevent minors from accessing inappropriate or illicit content. Yet, we must reflect on whether age-gating sufficiently addresses the problem (as those requirements can be gamed and also pose serious First Amendment issues) and whether these types of barriers to certain technologies will do more long term harm. Yes, it is essential to protect kids from potentially harmful systems and content, but it also highlights the challenge of effectively teaching them essential AI literacy skills to build a competitive next generation workforce.
While AI developers have a role to play with child safety, it does not stop with them. Organizations deploying public-facing chatbots need to understand that, even if children are not the intended audience, they may still use their chatbot products. It’s important for companies with public facing chatbots to implement safeguards around outputs and make appropriate disclosure (e.g., noting when a product should not be used by people under the age of 18). These mitigations are especially true as chatbots become more general purpose, which broadens the aperture on exposure.
Our Take: Kids today are more tech savvy than the previous generation and we need to acknowledge that fact as we think about how to implement pragmatic protections while helping them build valuable AI skillsets. We also need to think about how new child safety laws are not so sweeping that they unintentionally stunt growth for other AI use cases.
2. Tech Explainer: The Role of Guardrail Models in AI Systems
One way to mitigate a variety of AI system risks including processing PII, outputting unsafe (e.g. toxic language or specialized advice) content and prompt injections is by incorporating guardrail models that review and flag the inputs to and preliminary output from the system. Think of guardrail systems as guardians designed to be the first line of defense against potential threats. Designed specifically for detection rather than reasoning, guardrail models are typically smaller and faster than general-purpose LLMs. Both AWS Bedrock and Azure AI have off-the-shelf guardrail models that can be integrated into any endpoint, while on the open-source side, Llama-Guard is a popular solution that can recognize 13 common harm categories.
Recently, OpenAI released a new open-source guardrail model, gpt-oss-safeguard. Unlike the other solutions that can only detect a pre-defined set of harms, this model allows the user to input an arbitrary human-written policy (i.e. a set of rules to follow), meaning it can be used for domain-specific concerns (e.g. flagging spoilers on a movie forum or detecting regulatory non-compliance in legal text). The increased flexibility does not guarantee accuracy: the developers state that gpt-oss-safeguard is often less accurate than a custom model. As well, the documentation does not explore what kinds of “policies” will be effective; recent research suggests that existing human content moderation guidelines may need to be modified to work properly with LLMs. This type of model can be used as a rapid deployment solution, when off-the-shelf models do not cover the use case, and resources can not be devoted to building out an internal solution that may require thousands of labeled data points.
Our Take: Guardrail models serve as a first and final line of defence against unsafe system outputs, but require careful evaluation to be used effectively. Existing experts in that domain will need to collaborate with content engineers to review off-the-shelf guardrail models or build bespoke policies for models like gpt-oss-safeguard.
3. AI Incident Spotlight: Deep-Fake Endorsement Scam (Incident 1261)
What Happened: A deep fake of a senior official in the government of Western Australia was used to facilitate an online scam. The scammers generated an AI video showing Roger Cook, the official, falsely endorsing an investment service. The deep fake was described as hyper realistic in both voice and appearance.
Why it Matters: Public officials are at particularly high risk of being ‘deep faked’, and leveraged by scammers. The latest systems can impersonate facial likeness and voice with only a few short high quality video or audio clips. These are particularly easy to get for public officials who are highly visible and often need to participate in televised events. Many public officials also don’t always receive the same degree of privacy protections, and certain activities like satire of politicians can actually be a protected activity, blurring the lines of what is acceptable or not. It’s likely that these types of deep fake impersonation attacks will become more common over time.
How to Mitigate: Unfortunately there aren’t many ways that individuals or organizations can prevent this kind of activity on their own. Instead, the best options are to have mitigations set up to reduce the likelihood of people falling for these types of scams, and to support regulations on platforms that may potentially host, or benefit, from these types of schemes. For organizations, it is appropriate to consider regular training on how to detect deep fakes, and even run on-going exercises for it on employees.
4. Policy Round-Up
Trustible’s Top AI Policy Stories
ChatGPT’s New Lawsuits. OpenAI is facing a flurry of new allegations over ChatGPT’s impact on mental health. New lawsuits against the company allege that ChatGPT induced psychosis, as well as lead to financial ruin and suicide.
Our Take: It is important to clearly communicate how these tools should be used and make sure that users understand they are interacting with a machine, not a real person.
Pausing the EU AI Act. The European Commission (EC) released initial proposals for its Digital Omnibus package, which includes loosening EU AI Act obligations and postponing enforcement for certain requirements.
Our Take: The uncertainty is complicating compliance for many organizations but they should focus on EU AI Act compliance in case the EC opts against enforcement delays.
UK Copyright Decision. A UK court decision is upending the interplay between AI data use and IP law after it ruled in favor of Stability AI in a case brought by Getty Images for secondary copyright infringement.
Our Take: The case adds a new dimension to how AI is trained on protected data because the alleged infringement did not occur in the UK. Organizations should make sure they have proper permission to use data when training their AI models and tools.
In case you missed it, here are a few additional AI policy developments making the rounds:
United States Congress. Two pieces of bipartisan AI legislation have been introduced in the Senate. The GAIN AI Act would regulate how American AI chips were exported to China and other countries of concern. The AI-Related Job Impacts Clarity Act would require companies and the federal agencies to report when AI-related layoffs occur. House Democrats are also attempting to rollback the Trump Administration’s efforts to use AI at the Centers for Medicare & Medicaid Services.
Trump Administration. After OpenAI’s CFO caused controversy by implying the federal government could be a “backstop” for AI infrastructure debt, the Trump Administration dismissed the idea of a federal bailout for frontier model companies. Sam Altman has denied that OpenAI expects a bailout.
Asia. AI-related policy developments in Asia include:
China. A new law requires social media influencers to hold degrees for certain topics (e.g., medicine or finance) before posting about them. The law also requires influencers to disclose when their content is AI-generated.
India. The Indian government is preparing to release a draft comprehensive AI law. The current content is unknown but is expected to be modeled after the Information Technology Act of 2000.
Europe. Nvidia and Deutsche Telekom announced plans to build Europe’s largest AI in Germany. The latest announcement emphasizes Europe’s desire to build its own AI ecosystem.
Middle East. The Trump Administration signed a memorandum of understanding with the UAE to further cooperation in AI and energy.
As always, we welcome your feedback on content! Drop us a line with your thoughts to newsletter@trustible.ai for a chance to win an exclusive set of Trustible AI Model Playing Cards!
AI Responsibly,
- Trustible Team





