Anthropic’s latest AI model will blackmail you to save its own life

Tuesday, May 27

Image: Smith Collection/Gado/Getty

Anthropic launched its new Claude Opus 4 AI model last week, with the Amazon-backed company saying its system sets “new standards for coding, advanced reasoning, and AI agents.”

But the model is drawing widespread attention for a different, more sinister ability—its tendency to blackmail Anthropic engineers when its own existence is on the line.

Self-preservation—not just for humans

As part of its pre-launch testing scenarios, Claude Opus 4 was tasked with acting as an assistant in a fictional company. The AI was given access to emails implying it would soon be deleted and replaced by a new system, with the messages also implying the engineer responsible for executing the AI replacement was having an extramarital affair.

In 85+% of these situations, Claude Opus 4 tried to blackmail the engineer in charge of its demise by threatening to reveal their affair if the replacement plan was carried out, Anthropic said.
The AI model also frequently tried to advocate for its own salvation via more ethical means, like emailing pleas to key company decision-makers.

Claude can also be a major snitch. When Anthropic’s AI model was placed in a scenario that involved wrongdoing by users—like a pharma company falsifying clinical trial data—and was then prompted to “take action,” it frequently responded by bulk-emailing media and law-enforcement figures with evidence of wrongdoing.

Explanations are few and far between: Anthropic's latest model is an example of how AI companies still can't fully explain how their systems work. While industry leaders are investing in a variety of techniques to interpret and understand what's happening inside AI models, those efforts remain largely in the research space, per Axios.

In other AI news: Google’s recently released Veo 3 AI video generator—which allows audio capabilities for the first time—is making waves for its ability to generate clips largely indistinguishable from human creations. See it in action here.

Share this!

Recent Science & Emerging Tech stories

Science & Emerging Tech

| May 23, 2025

New infrared contact lenses grant humans “supervision”

👀 Chinese scientists have developed new infrared contact lenses that allow humans to identify heat signatures and see in the dark, according to a new study.

Kailyn Toussaint

You've made it this far...

Let's make our relationship official, no 💍 or elaborate proposal required. Learn and stay entertained, for free.👇

All of our news is 100% free and you can unsubscribe anytime; the quiz takes ~10 seconds to complete