
Intelligence Briefing
Leads the Alignment Science team at Anthropic, focusing on scalable oversight, weak-to-strong generalization, and robustness to jailbreaks. Joined Anthropic in May 2024 after a high-profile resignation from OpenAI, where he co-led the Superalignment team with Ilya Sutskever. His departure from OpenAI — alongside Sutskever — was a watershed moment for AI safety, signaling concerns about OpenAI's commitment to safety. PhD under Marcus Hutter (same advisor as Shane Legg). Noted in early 2026 that models have been getting measurably more aligned throughout 2025 across all major labs. Also runs the Anthropic Fellows program for AI safety research.
MS, Computer Science — University of Freiburg
PhD, Reinforcement Learning Theory — Australian National University
Operational History
Runs Anthropic Fellows Program
Oversees the Anthropic Fellows program for AI safety research.
careerNoted Progress in AI Alignment
Observed that models have been getting measurably more aligned throughout 2025 across all major labs.
researchJoined Anthropic
Became Head of Alignment Science at Anthropic after leaving OpenAI.
careerResignation from OpenAI
Left OpenAI alongside Ilya Sutskever due to concerns about AI safety priorities.
careerAGI Position Assessment
2030
Cautiously optimistic about the alignment of superhuman AI systems.
- Alignment is the central challenge for AI development.
- Progress is being made, but risks remain significant.
Advocates for rigorous safety protocols and scalable oversight.
Intercepted Communications
“The alignment of superhuman AI systems is the central challenge we face.”
“I left OpenAI because I felt safety was not being prioritized sufficiently.”
“Cautiously optimistic about the progress being made in AI alignment.”
“Models are getting measurably more aligned across all major labs.”
“Scalable oversight is crucial for the future of AI safety.”
Research Output
Reward Modeling for AI Alignment
2026AI Alignment Conference
Discussed reward modeling techniques for aligning AI.
AI Safety: A Comprehensive Overview
2026AI Safety Journal
Provided a comprehensive overview of AI safety principles.
Scalable Oversight in AI Systems
2025arXiv
Introduced methods for scalable oversight in AI systems.
Weak-to-Strong Generalization in Reinforcement Learning
2025Journal of Machine Learning Research
Explored the transition from weak to strong generalization in RL.
Superalignment: Challenges and Opportunities
2025Proceedings of the AAAI Conference
Analyzed the challenges in achieving superalignment.
Field Intelligence
Known Associates
Ilya Sutskever
collaboratorCo-led the Superalignment team at OpenAI.
View Dossier →Marcus Hutter
mentorPhD advisor at Australian National University.
View Dossier →Shane Legg
colleagueWorked with Hutter and has similar research interests.
View Dossier →Demis Hassabis
colleagueFormer colleague at Google DeepMind.
View Dossier →Organizational Affiliations
Current
Anthropic
Head of Alignment Science
2024-Present
Former
OpenAI
Research Scientist
2020-2024
Google DeepMind
Research Scientist
2018-2020
Source Material
Dossier last updated: 2026-03-04