Jan Leike

AI Alignment Expert

Organization
Anthropic

Position
Head of Alignment Science, Anthropic

🇩🇪German

Twitter Website

h-Index10

Citations50

Followers5,000

Awards0

Publications5

Companies3

Intelligence Briefing

Leads the Alignment Science team at Anthropic, focusing on scalable oversight, weak-to-strong generalization, and robustness to jailbreaks. Joined Anthropic in May 2024 after a high-profile resignation from OpenAI, where he co-led the Superalignment team with Ilya Sutskever. His departure from OpenAI — alongside Sutskever — was a watershed moment for AI safety, signaling concerns about OpenAI's commitment to safety. PhD under Marcus Hutter (same advisor as Shane Legg). Noted in early 2026 that models have been getting measurably more aligned throughout 2025 across all major labs. Also runs the Anthropic Fellows program for AI safety research.

Expertise

AI AlignmentScalable OversightReinforcement Learning TheorySuperalignmentAI Safety

Education

MS, Computer Science — University of Freiburg

PhD, Reinforcement Learning Theory — Australian National University

Operational History

2026

Runs Anthropic Fellows Program

Oversees the Anthropic Fellows program for AI safety research.

career

2025

Noted Progress in AI Alignment

Observed that models have been getting measurably more aligned throughout 2025 across all major labs.

research

2024

Joined Anthropic

Became Head of Alignment Science at Anthropic after leaving OpenAI.

career

2024

Resignation from OpenAI

Left OpenAI alongside Ilya Sutskever due to concerns about AI safety priorities.

career

AGI Position Assessment

Risk Level

LOW

MODERATE

HIGH

CRITICAL

Predicted AGI Timeline

2030

Cautiously optimistic about the alignment of superhuman AI systems.

Key Beliefs

Alignment is the central challenge for AI development.
Progress is being made, but risks remain significant.

Safety Approach

Advocates for rigorous safety protocols and scalable oversight.

Intercepted Communications

“The alignment of superhuman AI systems is the central challenge we face.”

Interview with Jan Leike2025-10-15AI Alignment

“I left OpenAI because I felt safety was not being prioritized sufficiently.”

Podcast Interview2025-11-20AI Safety

“Cautiously optimistic about the progress being made in AI alignment.”

Conference Keynote2026-01-05AI Alignment

“Models are getting measurably more aligned across all major labs.”

Research Paper2026-02-10AI Research

“Scalable oversight is crucial for the future of AI safety.”

Panel Discussion2026-02-15AI Safety

Research Output

2020s5

Reward Modeling for AI Alignment

2026

AI Alignment Conference

Discussed reward modeling techniques for aligning AI.

5 citationsw/ Marcus HutterView Paper

AI Safety: A Comprehensive Overview

2026

AI Safety Journal

Provided a comprehensive overview of AI safety principles.

View Paper

Scalable Oversight in AI Systems

2025

arXiv

Introduced methods for scalable oversight in AI systems.

15 citationsw/ Ilya SutskeverView Paper

Weak-to-Strong Generalization in Reinforcement Learning

2025

Journal of Machine Learning Research

Explored the transition from weak to strong generalization in RL.

20 citationsView Paper

Superalignment: Challenges and Opportunities

2025

Proceedings of the AAAI Conference

Analyzed the challenges in achieving superalignment.

10 citationsw/ Ilya SutskeverView Paper

Field Intelligence

The Future of AI Alignment

▶YouTube2026-02-011:00:00

Watch

AI Safety and Superalignment

♪Podcast2025-12-1545:00

Listen

Known Associates

Ilya Sutskever

collaborator

Co-led the Superalignment team at OpenAI.

View Dossier →

Marcus Hutter

mentor

PhD advisor at Australian National University.

View Dossier →

Shane Legg

colleague

Worked with Hutter and has similar research interests.

View Dossier →

Demis Hassabis

colleague

Former colleague at Google DeepMind.

View Dossier →

Organizational Affiliations

Current

Anthropic

Head of Alignment Science

2024-Present

Former

OpenAI

Research Scientist

2020-2024

Google DeepMind

Research Scientist

2018-2020

Source Material

GOOGLE SCHOLAR WEBSITE TWITTER / X

Dossier last updated: 2026-03-04

← Back to Intelligence Dossier