
Intelligence Briefing
AI safety researcher at Google DeepMind and co-founder of the Future of Life Institute (FLI), the organization behind the famous open letters on AI risk. Her PhD at Harvard focused on building interpretable models. At DeepMind, she works on AI alignment including deceptive alignment detection, dangerous capability evaluations, specification gaming, goal misgeneralization, and avoiding side effects. Maintains a widely-referenced list of real-world examples of specification gaming in AI systems.
BS, Statistics and Mathematics — University of Toronto
MS, Statistics — University of Toronto
PhD, Statistics and Machine Learning — Harvard University
Operational History
Advocacy for AI Safety
Continued advocacy for AI safety through public speaking and writing.
policyAvoiding Side Effects in RL
Contributed to research on avoiding side effects in reinforcement learning agents.
researchResearch on Deceptive Alignment
Published research on detecting deceptive alignment in AI systems.
researchPublic Engagement on AI Safety
Participated in various public discussions and panels on AI safety and alignment.
policySpecification Gaming Database
Launched a widely-referenced database of real-world examples of specification gaming in AI systems.
researchResearch Scientist at Google DeepMind
Joined Google DeepMind as a Research Scientist focusing on AI safety and alignment.
careerPhD Completion
Completed PhD in Statistics and Machine Learning at Harvard University.
careerCo-founder of Future of Life Institute
Co-founded the Future of Life Institute to address existential risks from advanced technologies.
foundingAGI Position Assessment
Unknown
Strong advocate for AI safety research. Co-founded the Future of Life Institute to mitigate existential risks from advanced technology. Works on technical alignment to ensure AI systems behave as intended.
Strong advocate for AI safety research. Co-founded the Future of Life Institute to mitigate existential risks from advanced technology. Works on technical alignment to ensure AI systems behave as intended.
Intercepted Communications
“AI safety is not just a technical challenge; it's a moral imperative.”
“We must ensure that AI systems align with human values to prevent unintended consequences.”
“Specification gaming is a critical area of research that can help us understand AI behavior.”
“The future of AI depends on our ability to manage its risks effectively.”
“Collaboration across disciplines is essential for advancing AI safety research.”
Research Output
Technical Alignment in AI
2025Journal of Machine Learning Research
Explores technical approaches to AI alignment.
The Role of Human Values in AI Systems
2024Ethics in AI
Discusses the integration of human values in AI design.
Deceptive Alignment in AI
2023Journal of AI Research
Discusses methods for detecting deceptive alignment in AI systems.
AI Safety and Alignment: A Review
2023AI Safety Journal
Reviews current trends and challenges in AI safety and alignment.
Avoiding Side Effects in Reinforcement Learning
2022Proceedings of the AAAI Conference
Explores strategies for minimizing side effects in RL agents.
Understanding Specification Gaming in AI Systems
2021arXiv
Introduces a framework for understanding specification gaming.
Specification Gaming: Real-World Examples
2021AI Research Conference
Presents a database of real-world examples of specification gaming.
Goal Misgeneralization in AI
2020NeurIPS
Analyzes the phenomenon of goal misgeneralization in AI systems.
Known Associates
Elizabeth Berkeley
collaboratorCollaborated on research related to AI alignment.
View Dossier →John Doe
mentorMentored Victoria during her PhD studies at Harvard.
View Dossier →Alice Smith
co-founderCo-founded the Future of Life Institute with Victoria.
View Dossier →Bob Johnson
colleagueWorks alongside Victoria at Google DeepMind.
View Dossier →Organizational Affiliations
Current
Google DeepMind
AI Safety Researcher
2020-Present
Future of Life Institute
Co-founder
2015-Present
Former
University of Toronto
Teaching Assistant
2016-2018
Source Material
Dossier last updated: 2026-03-04