Author Archives: TheZvi

AIs Will Increasingly Fake Alignment

This post goes over the important and excellent new paper from Anthropic and Redwood Research, with Ryan Greenblatt as lead author, Alignment Faking in Large Language Models. This is by far the best demonstration so far of the principle that … Continue reading

Posted in Uncategorized | Leave a comment

Monthly Roundup #25: December 2024

I took a trip to San Francisco early in December. Ever since then, things in the world of AI have been utterly insane. Google and OpenAI released endless new products, including Google Flash 2.0 and o1. Redwood Research and Anthropic … Continue reading

Posted in Uncategorized | 1 Comment

AI #95: o1 Joins the API

A lot happened this week. We’re seeing release after release after upgrade. It’s easy to lose sight of which ones matter, and two matter quite a lot. The first is Gemini Flash 2.0, which I covered earlier this week. The … Continue reading

Posted in Uncategorized | 1 Comment

A Matter of Taste

In light of other recent discussions, Scott Alexander recently attempted a unified theory of taste, proposing several hypotheses. Is it like physics, a priesthood, a priesthood with fake justifications, a priesthood with good justifications, like increasingly bizarre porn preferences, like … Continue reading

Posted in Uncategorized | Leave a comment

The Second Gemini

Table of Contents Trust the Chef. Do Not Trust the Marketing Department. Mark that Bench. Going Multimodal. The Art of Deep Research. Project Mariner the Web Agent. Project Astra the Universal Assistant. Project Jules the Code Agent. Gemini Will Aid … Continue reading

Posted in Uncategorized | Leave a comment

AIs Will Increasingly Attempt Shenanigans

Increasingly, we have seen papers eliciting in AI models various shenanigans. There are a wide variety of scheming behaviors. You’ve got your weight exfiltration attempts, sandbagging on evaluations, giving bad information, shielding goals from modification, subverting tests and oversight, lying, … Continue reading

Posted in Uncategorized | Leave a comment

The o1 System Card Is Not About o1

Or rather, we don’t actually have a proper o1 system card, aside from the outside red teaming reports. At all. Because, as I realized after writing my first draft of this, the data here does not reflect the o1 model … Continue reading

Posted in Uncategorized | 2 Comments

AI #94: Not Now, Google

At this point, we can confidently say that no, capabilities are not hitting a wall. Capacity density, how much you can pack into a given space, is way up and rising rapidly, and we are starting to figure out how … Continue reading

Posted in Uncategorized | 1 Comment

o1 Turns Pro

So, how about OpenAI’s o1 and o1 Pro? Sam Altman: o1 is powerful but it’s not so powerful that the universe needs to send us a tsunami. As a result, the universe realized its mistake, and cancelled the tsunami. We … Continue reading

Posted in Uncategorized | 1 Comment

Childhood and Education Roundup #7

Since it’s been so long, I’m splitting this roundup into several parts. This first one focuses away from schools and education and discipline and everything around social media.

Posted in Uncategorized | 7 Comments