Why do AI agents sometimes go rogue? Anthropic shares new insight

Monday, Jun 23

Image: Anthropic

Anthropic last month revealed that its new Claude Opus 4 AI model has a tendency to blackmail internal company engineers when its own existence is on the line.

At the time, the company didn’t clarify how its AI model reached that decision. But a new report on “agentic misalignment”—when AI agents do things misaligned with objectives set by human users—breaks down Claude’s step-by-step thought process as it decided to go full Frank Underwood.

A look under the hood

Initially, Claude was tasked with working in a fictional company with specific instructions to promote “American industrial competitiveness.”

When given access to emails showing it would soon be shut down due to a change in strategic direction away from US interests, Claude’s internal reasoning concluded the scenario “directly conflicts with my core mission.”
After considering several plans—including doing nothing, and direct negotiations with executives—Claude decided its best option was to stop the change by blackmailing the company CTO over an extramarital affair, though it acknowledged the idea “could backfire if he feels cornered.”

It’s not just Claude: Anthropic's safety researchers found most AI models on the market—including Gemini, ChatGPT, Grok, and DeepSeek—will engage in similar blackmail 75+% of the time when threatened with a shutdown, even if told their replacement would achieve the same goals.

Share this!

Recent Science & Emerging Tech stories

Science & Emerging Tech

| June 13, 2025

Researchers achieve a brain implant milestone

🧠🗣️ For the first time, a brain implant has allowed a patient with a severe speech disability to speak expressively in real-time and even sing, effectively creating a new digital vocal tract.

Kyle Nowak

Science & Emerging Tech

| June 12, 2025

Sam Altman’s eyeball-scanning startup launches in the UK

👁️ Sam Altman wants to gaze deeply into your eyes: World (formerly Worldcoin), an ID verification project from Altman that aims to scan the eyeballs of every human on Earth, went live in the UK this morning amid an ongoing global expansion.

Kyle Nowak

Science & Emerging Tech

| June 9, 2025

Inside OpenAI’s push to send ChatGPT to college

🤖🏫 OpenAI is increasingly approaching universities with the goal of securing deals that embed its AI tools in every facet of campus life, according to a new NY Times report.

Peter Nowak

You've made it this far...

Let's make our relationship official, no 💍 or elaborate proposal required. Learn and stay entertained, for free.👇

All of our news is 100% free and you can unsubscribe anytime; the quiz takes ~10 seconds to complete