Anthropic hires former OpenAI safety lead to head up new team

May 29, 2024 ndowd

Jan Leike, a leading AI researcher who earlier this month resigned from OpenAI before publicly criticizing the company’s approach to AI safety, has joined OpenAI rival Anthropic to lead a new “superalignment” team.

In a post on X, Leike said that his team at Anthropic will focus on various aspects of AI safety and security, specifically “scalable oversight,” “weak-to-strong generalization” and automated alignment research.

I’m excited to join @AnthropicAI to continue the superalignment mission!

My new team will work on scalable oversight, weak-to-strong generalization, and automated alignment research.

If you’re interested in joining, my dms are open.

— Jan Leike (@janleike) May 28, 2024

A source familiar with the matter tells TechCrunch that Leike will report directly to Jared Kaplan, Anthropic’s chief science officer, and that Anthropic researchers currently working on scalable oversight — techniques to control large-scale AI’s behavior in predictable and desirable ways — will move to report to Leike as Leike’s team spins up.

✨🪩 Woo! 🪩✨

Jan’s led some seminally important work on technical AI safety and I’m thrilled to be working with him! We’ll be leading twin teams aimed at different parts of the problem of aligning AI systems at human level and beyond. https://t.co/aqSFTnOEG0

— Sam Bowman (@sleepinyourhat) May 28, 2024

In many ways, Leike’s team sounds similar in mission to OpenAI’s recently dissolved Superalignment team. The Superalignment team, which Leike co-led, had the ambitious goal of solving the core technical challenges of controlling superintelligent AI in the next four years, but often found itself hamstrung by OpenAI’s leadership.

Anthropic has often attempted to position itself as more safety-focused than OpenAI.

Anthropic’s CEO, Dario Amodei, was once the VP of research at OpenAI and reportedly split with OpenAI after a disagreement over the company’s direction — namely OpenAI’s growing commercial focus. Amodei brought with him a number of ex-OpenAI employees to launch Anthropic, including OpenAI’s former policy lead Jack Clark.

source