OpenAI/GPT-5/Dec 11, 2025
GPT-5.2
OpenAI's flagship reasoning model. GPT-5.2 Thinking scores 80% on SWE-bench Verified and 55.6% on SWE-Bench Pro (state of the art). GPT-5.2 Pro crosses the 90% threshold on ARC-AGI-1 while reducing cost by 390x compared to o3. Features a 400K context window with up to 128K output tokens.
textcodereasoningvisiontool-useaudio
Arena ELO
—
Input Price
$1.75/M
Output Price
$14/M
Speed
71 t/s
Context
400K
Latency
99790ms
Capability Assessment
SWE-Bench Pro80.0%
Terminal-Bench 2.047.6%
GPQA Diamond92.4%
MMMU Pro86.7%
Comparative Analysis
| Metric | GPT-5.2 | Claude Opus 4.6 | Gemini 3 Pro | OpenAI o3 |
|---|---|---|---|---|
| SWE-bench | 80.0% | 80.8% | 76.2% | 71.7% |
| AIME 2025 | 100.0% | 100.0% | 95.0% | 96.7% |
| GPQA Diamond | 92.4% | 91.3% | 91.9% | 87.7% |
| MMLU | 91.0% | 91.0% | 91.8% | 92.9% |
| Input $/M | $1.75 | $5 | $2 | $2 |
| Output $/M | $14 | $25 | $12 | $8 |
ARC Prize
ARC Prize Snapshot
Cost per task vs score across current reasoning levels.
Updated 3/10/2026, 2:38:07 AM
ARC-AGI-1 Public
GPT-5.216.5% / $0.04
ARC-AGI-1 Semi-Private
GPT-5.212.3% / $0.05
ARC-AGI-2 Public
GPT-5.20.0% / $0.08
ARC-AGI-2 Semi-Private
GPT-5.20.8% / $0.08