OpenAI/GPT-5/Mar 5, 2026

GPT-5.4

OpenAI's March 5, 2026 GPT-5 upgrade focused on professional work, coding, browsing, and computer use. OpenAI reports stronger performance on GDPval, SWE-Bench Pro, BrowseComp, and OSWorld-Verified, with a standard 272K-token context window and experimental 1M-token support in Codex.

textcodereasoningvisiontool-usecomputer-useaudio

Arena ELO

—

Input Price

$2.5/M

Output Price

$15/M

Speed

—

Context

272K

Latency

—

Capability Assessment

GDPval83.0%

OfficeQA68.1%

IB Modeling87.3%

SWE-Bench Pro57.7%

Terminal-Bench 2.075.1%

OSWorld-Verified75.0%

WebArena-Verified67.3%

Online-Mind2Web92.8%

BrowseComp82.7%

MCP Atlas67.2%

Toolathlon54.6%

Tau2 Telecom98.9%

Frontier Science33.0%

FrontierMath T1-347.6%

FrontierMath T427.1%

HLE No Tools39.8%

HLE With Tools52.1%

GPQA Diamond92.8%

MMMU Pro81.2%

MMMU Pro + Tools82.1%

ARC-AGI-193.7%

ARC-AGI-273.3%

Graphwalks BFS 0-128K93.0%

Graphwalks BFS 256K-1M21.4%

Graphwalks Parents 0-128K89.8%

Graphwalks Parents 256K-1M32.4%

FinanceAgent v1.156.0%

Tau2 No Reasoning64.3%

Comparative Analysis

Metric	GPT-5.4	Claude Opus 4.6	Gemini 3 Pro	OpenAI o3
SWE-bench	57.7%	80.8%	76.2%	71.7%
AIME 2025	—	100.0%	95.0%	96.7%
GPQA Diamond	92.8%	91.3%	91.9%	87.7%
MMLU	—	91.0%	91.8%	92.9%
Input $/M	$2.5	$5	$2	$2
Output $/M	$15	$25	$12	$8

ARC Prize

ARC Prize Snapshot

Cost per task vs score across current reasoning levels.

Updated 3/10/2026, 2:38:08 AM

ARC-AGI-1 Public

GPT-5.4 (Low)80.0% / $0.12

GPT-5.4 (Medium)92.0% / $0.21

GPT-5.4 (High)95.6% / $0.27

GPT-5.4 (xHigh)96.4% / $0.43

ARC-AGI-1 Semi-Private

GPT-5.4 (Low)68.2% / $0.15

GPT-5.4 (Medium)86.2% / $0.25

GPT-5.4 (High)92.7% / $0.37

GPT-5.4 (xHigh)93.7% / $0.62

ARC-AGI-2 Public

GPT-5.4 (Low)23.2% / $0.29

GPT-5.4 (Medium)58.2% / $0.73

GPT-5.4 (High)75.8% / $1.08

GPT-5.4 (xHigh)84.2% / $1.57

ARC-AGI-2 Semi-Private

GPT-5.4 (Low)29.2% / $0.27

GPT-5.4 (Medium)55.4% / $0.68

GPT-5.4 (High)67.5% / $1.02

GPT-5.4 (xHigh)74.0% / $1.52

Source Material

Launch Post

Back to Model Comparison Matrix