
Intelligence Briefing
Foundational alignment researcher who developed the core techniques behind RLHF (Reinforcement Learning from Human Feedback), now used to train ChatGPT, Claude, and virtually all modern language models. Silver medalist at the International Math Olympiad (2008). Founded the Alignment Research Center (ARC) after leaving OpenAI in 2021. Appointed Head of AI Safety at NIST's US AI Safety Institute, where he designs and conducts evaluations of frontier AI models for national security concerns. His appointment generated controversy among NIST staff due to his association with effective altruism and longtermism.
BS, Mathematics β Massachusetts Institute of Technology
PhD, Statistical Learning Theory β University of California, Berkeley
Operational History
Founded Alignment Research Center (ARC)
Established ARC to focus on AI alignment research after leaving OpenAI.
foundingJoined US AI Safety Institute (NIST)
Appointed as Head of AI Safety at NIST's US AI Safety Institute.
careerSilver Medalist at International Math Olympiad
Achieved silver medal in the prestigious International Math Olympiad.
awardAGI Position Assessment
Unknown
One of the strongest voices for AI existential risk. Believes there is a significant probability of catastrophic outcomes from advanced AI. Advocates for robust safety evaluations, interpretability, and governance. Now leads US government AI safety evaluation efforts.
One of the strongest voices for AI existential risk. Believes there is a significant probability of catastrophic outcomes from advanced AI. Advocates for robust safety evaluations, interpretability, and governance. Now leads US government AI safety evaluation efforts.
Intercepted Communications
βThe risks posed by advanced AI are significant and require immediate attention.β
βWe must prioritize robust safety evaluations to ensure AI systems align with human values.β
βEffective altruism provides a framework for addressing existential risks from AI.β
βLongtermism is crucial in shaping the future of AI development.β
βAI alignment is not just a technical challenge; it's a moral imperative.β
Research Output
Eliciting Latent Knowledge
2021arXiv
Explored methods for extracting knowledge from AI models.
AI Alignment: Why It Matters
2020Discussed the importance of AI alignment in modern AI systems.
Iterated Distillation and Amplification
2018arXiv
Proposed a framework for improving AI alignment through iterative processes.
Deep Reinforcement Learning from Human Preferences
2017arXiv
Introduced methods for training agents using human feedback.
Known Associates
Eliezer Yudkowsky
collaboratorCollaborated on AI alignment research.
View Dossier βJascha Stiennon
collaboratorWorked together on RLHF projects.
View Dossier βOpenAI Team
colleagueFormer team member at OpenAI.
View Dossier βRobert Long
collaboratorCollaborated on AI safety evaluations.
View Dossier βOrganizational Affiliations
Current
Alignment Research Center
Founder
2021-Present
US AI Safety Institute (NIST)
Head of AI Safety
2021-Present
Former
OpenAI
Alignment Researcher
2018-2021
Source Material
Dossier last updated: 2026-03-04