Models/OpenAI o3
OpenAI/o-series/Apr 16, 2025

OpenAI o3

OpenAI's most intelligent reasoning model. o3 excels at math (96.7% on AIME 2024), science (87.7% GPQA Diamond), and coding (71.7% SWE-bench Verified, Codeforces Elo 2727). o3-pro variant offers even deeper reasoning at $20/$80 per million tokens. Now retired from ChatGPT but API remains available.

textcodereasoningvisiontool-use

Arena ELO

1,432

Input Price

$2/M

Output Price

$8/M

Speed

65 t/s

Context

200K

Latency

10880ms

Capability Assessment

SWE-Bench Pro71.7%
GPQA Diamond87.7%
MMMU Pro82.9%

Comparative Analysis

MetricOpenAI o3Claude Opus 4.6Gemini 3 ProDeepSeek V3.2
SWE-bench71.7%80.8%76.2%67.8%
AIME 202596.7%100.0%95.0%96.0%
GPQA Diamond87.7%91.3%91.9%82.4%
MMLU92.9%91.0%91.8%88.5%
Input $/M$2$5$2$0.28
Output $/M$8$25$12$0.42

ARC Prize

ARC Prize Snapshot

Cost per task vs score across current reasoning levels.

Updated 3/10/2026, 2:38:07 AM

20406080100o3 Lowo3 Mediumo3 Higho3-Pro Lowo3-Pro Mediumo3-Pro Higho3 Lowo3 Mediumo3 Higho3-Pro Lowo3-Pro Mediumo3-Pro Higho3 Lowo3 Mediumo3 Higho3-Pro Lowo3-Pro Mediumo3-Pro Higho3 Lowo3 Mediumo3 Higho3-Pro Lowo3-Pro Mediumo3-Pro HighCost per task (log scale)Score (%)

ARC-AGI-1 Public

o3 (Low)47.6% / $0.16
o3 (Medium)56.7% / $0.26
o3 (High)64.3% / $0.40
o3-Pro (Low)50.9% / $1.51
o3-Pro (Medium)58.1% / $2.55
o3-Pro (High)63.3% / $3.92

ARC-AGI-1 Semi-Private

o3 (Low)41.5% / $0.18
o3 (Medium)53.8% / $0.29
o3 (High)60.8% / $0.50
o3-Pro (Low)44.3% / $1.64
o3-Pro (Medium)57.0% / $3.18
o3-Pro (High)59.3% / $4.16

ARC-AGI-2 Public

o3 (Low)2.7% / $0.24
o3 (Medium)4.5% / $0.50
o3 (High)2.9% / $0.90
o3-Pro (Low)1.9% / $2.46
o3-Pro (Medium)3.5% / $5.16
o3-Pro (High)3.9% / $9.15

ARC-AGI-2 Semi-Private

o3 (Low)2.0% / $0.23
o3 (Medium)3.0% / $0.48
o3 (High)6.5% / $0.83
o3-Pro (Low)2.0% / $2.23
o3-Pro (Medium)1.9% / $4.74
o3-Pro (High)4.9% / $7.55

Source Material