OpenAI/o-series/Apr 16, 2025
OpenAI o3
OpenAI's most intelligent reasoning model. o3 excels at math (96.7% on AIME 2024), science (87.7% GPQA Diamond), and coding (71.7% SWE-bench Verified, Codeforces Elo 2727). o3-pro variant offers even deeper reasoning at $20/$80 per million tokens. Now retired from ChatGPT but API remains available.
textcodereasoningvisiontool-use
Arena ELO
1,432
Input Price
$2/M
Output Price
$8/M
Speed
65 t/s
Context
200K
Latency
10880ms
Capability Assessment
SWE-Bench Pro71.7%
GPQA Diamond87.7%
MMMU Pro82.9%
Comparative Analysis
| Metric | OpenAI o3 | Claude Opus 4.6 | Gemini 3 Pro | DeepSeek V3.2 |
|---|---|---|---|---|
| SWE-bench | 71.7% | 80.8% | 76.2% | 67.8% |
| AIME 2025 | 96.7% | 100.0% | 95.0% | 96.0% |
| GPQA Diamond | 87.7% | 91.3% | 91.9% | 82.4% |
| MMLU | 92.9% | 91.0% | 91.8% | 88.5% |
| Input $/M | $2 | $5 | $2 | $0.28 |
| Output $/M | $8 | $25 | $12 | $0.42 |
ARC Prize
ARC Prize Snapshot
Cost per task vs score across current reasoning levels.
Updated 3/10/2026, 2:38:07 AM
ARC-AGI-1 Public
o3 (Low)47.6% / $0.16
o3 (Medium)56.7% / $0.26
o3 (High)64.3% / $0.40
o3-Pro (Low)50.9% / $1.51
o3-Pro (Medium)58.1% / $2.55
o3-Pro (High)63.3% / $3.92
ARC-AGI-1 Semi-Private
o3 (Low)41.5% / $0.18
o3 (Medium)53.8% / $0.29
o3 (High)60.8% / $0.50
o3-Pro (Low)44.3% / $1.64
o3-Pro (Medium)57.0% / $3.18
o3-Pro (High)59.3% / $4.16
ARC-AGI-2 Public
o3 (Low)2.7% / $0.24
o3 (Medium)4.5% / $0.50
o3 (High)2.9% / $0.90
o3-Pro (Low)1.9% / $2.46
o3-Pro (Medium)3.5% / $5.16
o3-Pro (High)3.9% / $9.15
ARC-AGI-2 Semi-Private
o3 (Low)2.0% / $0.23
o3 (Medium)3.0% / $0.48
o3 (High)6.5% / $0.83
o3-Pro (Low)2.0% / $2.23
o3-Pro (Medium)1.9% / $4.74
o3-Pro (High)4.9% / $7.55