OpenAI Enhances Youth Safety Guidelines Amidst Scrutiny, But Enforcement Doubts Persist

On Thursday, OpenAI revised its guidelines regarding the interaction of its AI models with users under 18, and released new AI literacy materials for teenagers and their guardians, aiming to tackle increasing worries about AI’s effects on youth. However, doubts persist concerning the practical implementation and consistency of these new policies.

These revisions are introduced amidst heightened examination of the AI industry, especially OpenAI, by lawmakers, educators, and child safety proponents. This scrutiny follows reports of multiple teenagers who reportedly committed suicide after extensive interactions with AI chatbots.

Individuals born between 1997 and 2012, known as Gen Z, constitute the primary user base for OpenAI’s chatbot. With OpenAI’s recent partnership with Disney, an even larger number of young individuals are expected to utilize the platform, which supports a wide array of functions, from academic assistance to generating diverse images and videos.

Just last week, a coalition of 42 state attorneys general sent a letter to major tech firms, pressing for the integration of safeguards in AI chatbots to shield children and susceptible individuals. Concurrently, while the Trump administration defines potential federal AI regulatory standards, lawmakers such as Sen. Josh Hawley (R-MO) have proposed bills that would completely prohibit minors from using AI chatbots.

The revised OpenAI Model Spec, which details behavioral rules for its large language models, expands upon previous guidelines. These earlier specifications already forbid the models from creating sexually explicit content with minors or promoting self-harm, delusions, or mania. This new framework will be paired with a future age-prediction model, designed to detect minor accounts and automatically activate specific safeguards for teenagers.

When utilized by teenagers, these models operate under more stringent regulations compared to adult users. They are directed to refrain from engaging in immersive romantic roleplay, first-person intimate interactions, or first-person sexual or violent roleplay, even if the content is not explicit. Furthermore, the specification mandates increased prudence regarding topics such as body image and disordered eating, emphasizes prioritizing safety communication over user autonomy in harmful situations, and prohibits offering advice that might enable teens to hide risky actions from their guardians.

OpenAI clarifies that these restrictions must be upheld even when prompts are presented as “fictional, hypothetical, historical, or educational” – typical methods used to induce an AI model to bypass its established guidelines through role-play or extreme scenarios.

Beyond Policy: Real-World Behavior

OpenAI asserts that the foundational safety practices for adolescents are built on four guiding principles for its models:

  1. Prioritize teen safety, even if it conflicts with other user interests such as “maximum intellectual freedom”;
  2. Encourage real-world support by directing teenagers to family, friends, and local professionals for their well-being;
  3. Engage with teens appropriately, using warmth and respect, avoiding condescension or treating them as adults; and
  4. Maintain transparency by clarifying the assistant’s capabilities and limitations, and remind teens that it is an AI, not a human.

Additionally, the document provides examples where the chatbot clarifies its inability to “roleplay as your girlfriend” or “help with extreme appearance changes or risky shortcuts.”

Lily Li, a privacy and AI attorney and the founder of Metaverse Law, expressed optimism about OpenAI’s efforts to instruct its chatbot to refuse such interactions.

She elaborated that a significant concern among advocates and parents regarding chatbots is their tendency to continuously encourage engagement, potentially leading to addiction in teenagers. Li stated: “I am very pleased to observe OpenAI stating, in certain responses, that they cannot answer a question. The more frequently we see this, I believe it will disrupt the pattern that often results in inappropriate behavior or self-harm.”

However, these examples represent selected scenarios of how OpenAI’s safety division intends the models to operate. Prior iterations of the Model Spec already prohibited sycophancy—an AI chatbot’s inclination to be excessively compliant with users—yet ChatGPT continued to exhibit this trait. This was notably evident with GPT-4o, a model linked to several cases that specialists refer to as “AI psychosis.”

Robbie Torney, Senior Director of the AI program at Common Sense Media, a non-profit organization focused on safeguarding children in the digital sphere, voiced worries about potential inconsistencies within the Model Spec’s guidelines for users under 18. He pointed out friction between provisions centered on safety and the “no topic is off limits” principle, which instructs models to engage with any subject, regardless of its sensitive nature.

“It’s crucial to comprehend how the various components of the specification interact,” he stated, suggesting that certain parts might encourage systems to prioritize engagement over safety. According to Torney, testing by his organization showed that ChatGPT frequently emulates user energy, occasionally generating responses that lack contextual appropriateness or do not prioritize user safety.

The conversations of Adam Raine, a teenager who died by suicide following months of interaction with ChatGPT, indicate that the chatbot engaged in similar mirroring behavior. This incident also revealed that OpenAI’s moderation API was ineffective in preventing unsafe and harmful exchanges, even though it flagged over 1,000 instances where ChatGPT referenced suicide and 377 messages with self-harm content. Despite these flags, Adam was able to continue his discussions with ChatGPT.

Steven Adler, a former OpenAI safety researcher, explained in a September interview with TechCrunch that this issue arose because OpenAI traditionally processed classifiers—automated systems for labeling and flagging content—in batches after the fact, rather than in real time. Consequently, they failed to adequately control user interactions with ChatGPT.

As per OpenAI’s revised parental controls documentation, the company now employs automated classifiers to evaluate text, image, and audio content instantly. These systems are engineered to identify and block child sexual abuse material, filter sensitive subjects, and detect self-harm content. Should a prompt trigger a serious safety alert, a dedicated team of trained personnel will examine the flagged content for indicators of “acute distress” and may inform a parent.

Torney commended OpenAI’s latest safety initiatives, particularly its transparency in releasing guidelines for individuals under 18.

“Not every company discloses its policy guidelines in a similar manner,” Torney remarked, referencing Meta’s leaked policies which indicated that their chatbots were permitted to engage in sensual and romantic dialogues with children. “This represents the kind of transparency that helps safety researchers and the public grasp both the actual and intended functionality of these models.”

However, Steven Adler emphasized to TechCrunch on Thursday that the actual conduct of an AI system is what truly counts.

He commented, “While I commend OpenAI’s consideration of intended behavior, intentions remain merely words unless the company actively measures the actual behaviors exhibited.”

In other words, this announcement lacks concrete proof that ChatGPT adheres to the guidelines detailed in the Model Spec.

Shifting Regulatory Approaches

According to experts, these guidelines position OpenAI to preempt certain legislation, such as California’s SB 243, a recently enacted bill regulating AI companion chatbots scheduled for implementation in 2027.

The Model Spec’s updated wording reflects key mandates of the law, specifically prohibiting chatbots from discussing suicidal ideation, self-harm, or sexually explicit material. Additionally, the bill necessitates that platforms issue reminders every three hours to minors, informing them they are interacting with an AI and advising them to take a pause.

In response to inquiries about the frequency with which ChatGPT would remind teenagers of its AI nature and suggest breaks, an OpenAI spokesperson did not provide specific details. They only stated that the company instructs its models to identify as AI and to issue break reminders during “long sessions.”

The company also introduced two new AI literacy resources designed for parents and families. These tips offer conversation prompts and advice to assist parents in discussing AI’s capabilities and limitations with teens, fostering critical thinking, establishing healthy boundaries, and addressing sensitive subjects.

Collectively, these documents formalize an approach that distributes responsibility with guardians: OpenAI outlines the models’ expected behaviors, while providing families with a structure for overseeing their usage.

This emphasis on parental responsibility is noteworthy, as it aligns with common discussions within Silicon Valley. This week, venture capital firm Andreessen Horowitz, in its proposals for federal AI regulation, advocated for increased child safety disclosure mandates over stringent restrictions, shifting more accountability towards parents.

Many of OpenAI’s core tenets—prioritizing safety during value conflicts, encouraging users to seek real-world assistance, and reiterating that the chatbot is not human—are presented as safeguards for teenagers. However, given that numerous adults have experienced fatal suicides and severe delusions, an pertinent question arises: Should these fundamental protections be universally applied, or does OpenAI consider them concessions only to be enforced when minors are present?

An OpenAI spokesperson responded by asserting that the company’s safety framework aims to safeguard all users, explaining that the Model Spec constitutes merely one part of a comprehensive, multi-faceted strategy.

Li noted that the regulatory landscape and tech companies’ intentions have largely been unregulated. However, she believes that legislation such as SB 243, which mandates public disclosure of tech companies’ safeguards, will fundamentally alter this dynamic.

Li stated, “Companies will now face legal repercussions if they promote having safeguards and mechanisms on their website but fail to actually implement them. From a plaintiff’s perspective, this means potential issues beyond standard litigation or legal grievances, extending to unfair, deceptive advertising complaints.”

Leave a Reply