OpenAI/GPT-5/Mar 5, 2026
GPT-5.4
OpenAI's March 5, 2026 GPT-5 upgrade focused on professional work, coding, browsing, and computer use. OpenAI reports stronger performance on GDPval, SWE-Bench Pro, BrowseComp, and OSWorld-Verified, with a standard 272K-token context window and experimental 1M-token support in Codex.
textcodereasoningvisiontool-usecomputer-useaudio
Arena ELO
—
Input Price
$2.5/M
Output Price
$15/M
Speed
—
Context
272K
Latency
—
Capability Assessment
GDPval83.0%
OfficeQA68.1%
IB Modeling87.3%
SWE-Bench Pro57.7%
Terminal-Bench 2.075.1%
OSWorld-Verified75.0%
WebArena-Verified67.3%
Online-Mind2Web92.8%
BrowseComp82.7%
MCP Atlas67.2%
Toolathlon54.6%
Tau2 Telecom98.9%
Frontier Science33.0%
FrontierMath T1-347.6%
FrontierMath T427.1%
HLE No Tools39.8%
HLE With Tools52.1%
GPQA Diamond92.8%
MMMU Pro81.2%
MMMU Pro + Tools82.1%
ARC-AGI-193.7%
ARC-AGI-273.3%
Graphwalks BFS 0-128K93.0%
Graphwalks BFS 256K-1M21.4%
Graphwalks Parents 0-128K89.8%
Graphwalks Parents 256K-1M32.4%
FinanceAgent v1.156.0%
Tau2 No Reasoning64.3%
Comparative Analysis
| Metric | GPT-5.4 | Claude Opus 4.6 | Gemini 3 Pro | OpenAI o3 |
|---|---|---|---|---|
| SWE-bench | 57.7% | 80.8% | 76.2% | 71.7% |
| AIME 2025 | — | 100.0% | 95.0% | 96.7% |
| GPQA Diamond | 92.8% | 91.3% | 91.9% | 87.7% |
| MMLU | — | 91.0% | 91.8% | 92.9% |
| Input $/M | $2.5 | $5 | $2 | $2 |
| Output $/M | $15 | $25 | $12 | $8 |
ARC Prize
ARC Prize Snapshot
Cost per task vs score across current reasoning levels.
Updated 3/10/2026, 2:38:08 AM
ARC-AGI-1 Public
GPT-5.4 (Low)80.0% / $0.12
GPT-5.4 (Medium)92.0% / $0.21
GPT-5.4 (High)95.6% / $0.27
GPT-5.4 (xHigh)96.4% / $0.43
ARC-AGI-1 Semi-Private
GPT-5.4 (Low)68.2% / $0.15
GPT-5.4 (Medium)86.2% / $0.25
GPT-5.4 (High)92.7% / $0.37
GPT-5.4 (xHigh)93.7% / $0.62
ARC-AGI-2 Public
GPT-5.4 (Low)23.2% / $0.29
GPT-5.4 (Medium)58.2% / $0.73
GPT-5.4 (High)75.8% / $1.08
GPT-5.4 (xHigh)84.2% / $1.57
ARC-AGI-2 Semi-Private
GPT-5.4 (Low)29.2% / $0.27
GPT-5.4 (Medium)55.4% / $0.68
GPT-5.4 (High)67.5% / $1.02
GPT-5.4 (xHigh)74.0% / $1.52