← Back to Intelligence Dossier
Jan Leike

Jan Leike

AI Alignment Expert

Organization
Anthropic

Position
Head of Alignment Science, Anthropic

🇩🇪German
h-Index10
Citations50
Followers5,000
Awards0
Publications5
Companies3

Intelligence Briefing

Leads the Alignment Science team at Anthropic, focusing on scalable oversight, weak-to-strong generalization, and robustness to jailbreaks. Joined Anthropic in May 2024 after a high-profile resignation from OpenAI, where he co-led the Superalignment team with Ilya Sutskever. His departure from OpenAI — alongside Sutskever — was a watershed moment for AI safety, signaling concerns about OpenAI's commitment to safety. PhD under Marcus Hutter (same advisor as Shane Legg). Noted in early 2026 that models have been getting measurably more aligned throughout 2025 across all major labs. Also runs the Anthropic Fellows program for AI safety research.

Expertise
AI AlignmentScalable OversightReinforcement Learning TheorySuperalignmentAI Safety
Education

MS, Computer ScienceUniversity of Freiburg

PhD, Reinforcement Learning TheoryAustralian National University

Operational History

2026

Runs Anthropic Fellows Program

Oversees the Anthropic Fellows program for AI safety research.

career
2025

Noted Progress in AI Alignment

Observed that models have been getting measurably more aligned throughout 2025 across all major labs.

research
2024

Joined Anthropic

Became Head of Alignment Science at Anthropic after leaving OpenAI.

career
2024

Resignation from OpenAI

Left OpenAI alongside Ilya Sutskever due to concerns about AI safety priorities.

career

AGI Position Assessment

Risk Level
LOW
MODERATE
HIGH
CRITICAL
Predicted AGI Timeline

2030

Cautiously optimistic about the alignment of superhuman AI systems.

Key Beliefs
  • Alignment is the central challenge for AI development.
  • Progress is being made, but risks remain significant.
Safety Approach

Advocates for rigorous safety protocols and scalable oversight.

Intercepted Communications

The alignment of superhuman AI systems is the central challenge we face.

Interview with Jan Leike2025-10-15AI Alignment

I left OpenAI because I felt safety was not being prioritized sufficiently.

Podcast Interview2025-11-20AI Safety

Cautiously optimistic about the progress being made in AI alignment.

Conference Keynote2026-01-05AI Alignment

Models are getting measurably more aligned across all major labs.

Research Paper2026-02-10AI Research

Scalable oversight is crucial for the future of AI safety.

Panel Discussion2026-02-15AI Safety

Research Output

2020s5

Reward Modeling for AI Alignment

2026

AI Alignment Conference

Discussed reward modeling techniques for aligning AI.

5 citationsw/ Marcus HutterView Paper

AI Safety: A Comprehensive Overview

2026

AI Safety Journal

Provided a comprehensive overview of AI safety principles.

Scalable Oversight in AI Systems

2025

arXiv

Introduced methods for scalable oversight in AI systems.

15 citationsw/ Ilya SutskeverView Paper

Weak-to-Strong Generalization in Reinforcement Learning

2025

Journal of Machine Learning Research

Explored the transition from weak to strong generalization in RL.

20 citationsView Paper

Superalignment: Challenges and Opportunities

2025

Proceedings of the AAAI Conference

Analyzed the challenges in achieving superalignment.

10 citationsw/ Ilya SutskeverView Paper

Field Intelligence

The Future of AI Alignment

YouTube2026-02-011:00:00

AI Safety and Superalignment

Podcast2025-12-1545:00

Known Associates

Organizational Affiliations

Current

Anthropic

Head of Alignment Science

2024-Present

Former

OpenAI

Research Scientist

2020-2024

Google DeepMind

Research Scientist

2018-2020

Source Material

Dossier last updated: 2026-03-04