OpenAI/o-series/Apr 16, 2025
OpenAI o4-mini
OpenAI's small reasoning model optimized for fast, cost-efficient reasoning. Achieves 99.5% on AIME 2025 (with Python) and excels at math, coding, and visual tasks. The best-performing benchmarked model on AIME 2024 and 2025. Retired from ChatGPT but API remains available.
textcodereasoningvisiontool-use
Arena ELO
1,391
Input Price
$1.1/M
Output Price
$4.4/M
Speed
114 t/s
Context
200K
Latency
64850ms
Capability Assessment
SWE-Bench Pro68.1%
GPQA Diamond81.4%
MMMU Pro81.6%
Comparative Analysis
| Metric | OpenAI o4-mini | Claude Opus 4.6 | Gemini 3 Pro | OpenAI o3 |
|---|---|---|---|---|
| SWE-bench | 68.1% | 80.8% | 76.2% | 71.7% |
| AIME 2025 | 92.7% | 100.0% | 95.0% | 96.7% |
| GPQA Diamond | 81.4% | 91.3% | 91.9% | 87.7% |
| MMLU | 90.0% | 91.0% | 91.8% | 92.9% |
| Input $/M | $1.1 | $5 | $2 | $2 |
| Output $/M | $4.4 | $25 | $12 | $8 |
ARC Prize
ARC Prize Snapshot
Cost per task vs score across current reasoning levels.
Updated 3/10/2026, 2:38:07 AM
ARC-AGI-1 Public
o4-mini (Low)27.6% / $0.04
o4-mini (Medium)50.2% / $0.13
o4-mini (High)68.0% / $0.32
ARC-AGI-1 Semi-Private
o4-mini (Low)21.3% / $0.04
o4-mini (Medium)41.8% / $0.15
o4-mini (High)58.7% / $0.41
ARC-AGI-2 Public
o4-mini (Low)0.3% / $0.05
o4-mini (Medium)2.2% / $0.24
o4-mini (High)7.5% / $0.88
ARC-AGI-2 Semi-Private
o4-mini (Low)1.7% / $0.05
o4-mini (Medium)2.4% / $0.23
o4-mini (High)6.1% / $0.86