xAI/Grok 4/Jul 10, 2025
Grok 4
xAI's frontier reasoning model with always-on "Think" reasoning and an optional Heavy tier that runs five Grok 4 agents in parallel. Achieves 88% on GPQA Diamond and approximately 70.8-75% on SWE-bench Verified. Features a 260K-token context window.
textcodereasoningvisiontool-use
Arena ELO
1,492
Input Price
$3/M
Output Price
$15/M
Speed
45 t/s
Context
260K
Latency
15570ms
Capability Assessment
SWE-Bench Pro72.0%
GPQA Diamond88.0%
MMMU Pro76.5%
Comparative Analysis
| Metric | Grok 4 | Claude Opus 4.6 | Gemini 3.1 Pro | Gemini 3 Pro |
|---|---|---|---|---|
| SWE-bench | 72.0% | 80.8% | 80.6% | 76.2% |
| AIME 2025 | 94.0% | 100.0% | 91.2% | 95.0% |
| GPQA Diamond | 88.0% | 91.3% | 94.3% | 91.9% |
| MMLU | — | 91.0% | 92.6% | 91.8% |
| Input $/M | $3 | $5 | $2 | $2 |
| Output $/M | $15 | $25 | $12 | $12 |