xAI/Grok 3/Feb 17, 2025
Grok 3
xAI's heavyweight contender, trained on the Colossus 100K-GPU cluster. A 3T MoE model that emerged as a serious frontier player with top-tier MMLU and AIME scores, proving that brute-force compute still has surprises left.
textcodereasoningvision
Arena ELO
1,411
Input Price
$3/M
Output Price
$15/M
Speed
69 t/s
Context
1M
Latency
750ms
Capability Assessment
GPQA Diamond84.6%
Comparative Analysis
| Metric | Grok 3 | Claude Opus 4.6 | Gemini 3 Pro | OpenAI o3 |
|---|---|---|---|---|
| SWE-bench | — | 80.8% | 76.2% | 71.7% |
| AIME 2025 | 93.3% | 100.0% | 95.0% | 96.7% |
| GPQA Diamond | 84.6% | 91.3% | 91.9% | 87.7% |
| MMLU | 92.7% | 91.0% | 91.8% | 92.9% |
| Input $/M | $3 | $5 | $2 | $2 |
| Output $/M | $15 | $25 | $12 | $8 |
ARC Prize
ARC Prize Snapshot
Cost per task vs score across current reasoning levels.
Updated 3/10/2026, 2:38:07 AM
ARC-AGI-1 Public
Grok 38.4% / $0.07
ARC-AGI-1 Semi-Private
Grok 35.5% / $0.09
ARC-AGI-2 Public
Grok 30.0% / $0.14
ARC-AGI-2 Semi-Private
Grok 30.0% / $0.14