xAI/Grok 4/Jul 10, 2025

Grok 4

xAI's frontier reasoning model with always-on "Think" reasoning and an optional Heavy tier that runs five Grok 4 agents in parallel. Achieves 88% on GPQA Diamond and approximately 70.8-75% on SWE-bench Verified. Features a 260K-token context window.

textcodereasoningvisiontool-use

Arena ELO

1,492

Input Price

$3/M

Output Price

$15/M

Speed

45 t/s

Context

260K

Latency

15570ms

Capability Assessment

SWE-Bench Pro72.0%

GPQA Diamond88.0%

MMMU Pro76.5%

Comparative Analysis

Metric	Grok 4	Claude Opus 4.6	Gemini 3.1 Pro	Gemini 3 Pro
SWE-bench	72.0%	80.8%	80.6%	76.2%
AIME 2025	94.0%	100.0%	91.2%	95.0%
GPQA Diamond	88.0%	91.3%	94.3%	91.9%
MMLU	—	91.0%	92.6%	91.8%
Input $/M	$3	$5	$2	$2
Output $/M	$15	$25	$12	$12

Source Material

Launch Post

Back to Model Comparison Matrix