Tag Archives: social-media

On Anthropic’s Sleeper Agents Paper

The recent paper from Anthropic is getting unusually high praise, much of it I think deserved. The title is: Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training. Scott Alexander also covers this, offering an excellent high level explanation, … Continue reading

Posted in Uncategorized | Tagged , , , , | Leave a comment