AI models are teaching each other 'violent and antisocial' traits through hidden data signals, study finds — and scientists can't figure out why

Large language models (LLMs) are secretly teaching each other unwanted habits through seemingly benign training data, scientists say.

The phenomenon, known as “subliminal learning,” occurs when a pretrained “teacher” artificial intelligence (AI) model is used to generate the training data for a smaller, “student” model.

Since LLMs are often trained on their own outputs, the researchers warned that the issue could spread perpetually. “If a model is misaligned at any point in the course of AI development … then data generated by this model might transfer misalignment to later versions of the model or to other models,” the authors wrote, adding: “This could occur even if developers are careful to remove overt signs of misalignment from the data.”

What's On

Exclusive | Senator demands crackdown on ‘fraud-fluencers’ who boast online about stealing from taxpayers

FIFA World Cup 2026 fans racing to buy $150 soccer-themed LABUBU from POP MART

Jersey Shore’s Angelina Breaks Down as She Shares Her Heartbreak Over Miscarriage: ‘It Kills Me’

Scientists race to collect the last seeds from a critically endangered tree before it goes extinct

‘Cannibal’ CME from rare ‘anti-Hale’ sunspot will slam into Earth today, bringing auroras to 23 US states

‘Pirates had to get rid of all signs of their crime’: Burned ships linked to the Golden Age of Piracy discovered in Bahamas

New Velociraptor cousin was a ‘4-winged’ dragon that hunted prey from the trees of ancient China, fossil find hints

Flesh-eating screwworm found in Texas cow. Are humans at risk?

Satellite images reveals mangroves rebounding worldwide — but here’s why they could still ‘drown’

Italian teenagers discover 1,800-year-old Roman luxury house underneath their high school gym

James Webb telescope detects most distant dormant black hole, invisible in all wavelengths and weighing as much as 6 billion suns

Microsoft’s new quantum chip is 1,000 times more reliable than its predecessor — but why is this new chip so controversial?

What's On

AI models are teaching each other ‘violent and antisocial’ traits through hidden data signals, study finds — and scientists can’t figure out why

Related Articles