GPT-5.4
OpenAI launches GPT-5.4 with major gains on professional work, tool use, browsing, and ARC-style reasoning benchmarks.
GPT-5.4 posts 83.0% on GDPval, 57.7% on SWE-Bench Pro, 75.1% on Terminal-Bench 2.0, 82.7% on BrowseComp, and 75.0% on OSWorld-Verified. OpenAI also reports 93.7% on ARC-AGI-1 and 73.3% on ARC-AGI-2, while the ARC Prize leaderboard tracks separate GPT-5.4 reasoning levels that currently climb as high as 98.25% on ARC-AGI-1 public eval and 92.21% on ARC-AGI-2 public eval for GPT-5.4 Pro xHigh.
Headline metrics
GDPval:83.0%
SWE-Bench Pro:57.7%
ARC-AGI-2:73.3%
OpenAI reported
BrowseComp:82.7%
Evaluation highlights
Terminal-Bench 2.0
75.1%
OSWorld-Verified
75.0%
Toolathlon
54.6%
ARC Prize · ARC-AGI-2 Public
92.21%
GPT-5.4 Pro xHigh leaderboard entry