Weapons Assessment Division

Model Comparison Matrix

A comprehensive comparison of frontier AI models across capability benchmarks and practical deployment metrics. Sortable, searchable, filterable. Best-in-class values highlighted in red.

32 MODELS ON FILE

Showing 32 of 32 models

ModelOrgSWE-benchTerminalAIMEGPQAMMLUMMMUFinanceELOSpeedIn $/MOut $/MContextLatency
Claude Opus 4.6Anthropic80.8%65.4%100.0%91.3%91.0%77.0%1,50346 t/s$5$251M1900ms
Gemini 3.1 ProGoogle80.6%68.5%91.2%94.3%92.6%80.5%1,50184 t/s$2$121M46400ms
Grok 4xAI72.0%94.0%88.0%76.5%1,49245 t/s$3$15260K15570ms
Gemini 3 ProGoogle76.2%95.0%91.9%91.8%81.0%1,486117 t/s$2$121M20250ms
Gemini 3 FlashGoogle78.0%90.4%91.8%81.2%1,470180 t/s$0.5$31M8840ms
OpenAI o3OpenAI71.7%96.7%87.7%92.9%82.9%1,43265 t/s$2$8200K10880ms
Grok 4.1 FastxAI1,430126 t/s$0.2$0.52M620ms
DeepSeek V3.2DeepSeek67.8%39.6%96.0%82.4%88.5%1,42139 t/s$0.28$0.42128K1370ms
DeepSeek R1-0528DeepSeek57.6%87.5%81.0%93.4%1,419282 t/s$1.35$5.4130K570ms
Mistral Large 3Mistral53.3%43.9%85.5%1,41543 t/s$0.5$1.5256K1050ms
Grok 3xAI93.3%84.6%92.7%1,41169 t/s$3$151M750ms
OpenAI o4-miniOpenAI68.1%92.7%81.4%90.0%81.6%1,391114 t/s$1.1$4.4200K64850ms
Grok 3 MinixAI95.8%1,363183 t/s$0.3$0.5131K720ms
GPT-5OpenAI72.8%43.8%94.6%85.7%89.4%84.2%46.9%1,350$1.25$10400K
Llama 4 MaverickMeta69.8%92.4%73.4%1,327126 t/s$0.31$0.851M810ms
Llama 4 ScoutMeta57.2%87.2%69.4%1,322149 t/s$0.18$0.6610M780ms
Claude Sonnet 4.5Anthropic77.2%50.0%87.0%83.4%89.1%77.8%55.3%1,32080 t/s$3$15200K
Claude Sonnet 4.6Anthropic79.6%83.0%74.1%74.2%48 t/s$3$151M730ms
Claude Haiku 4.5Anthropic73.3%40.0%96.3%105 t/s$1$5200K620ms
Claude Opus 4.1Anthropic74.5%46.5%78.0%81.0%89.5%77.1%50.9%$15$75200K
Claude Sonnet 4Anthropic72.7%36.4%70.5%76.1%86.5%74.4%44.5%$3$15200K
GPT-5.4OpenAI57.7%75.1%92.8%81.2%56.0%$2.5$15272K
GPT-5.3 CodexOpenAI80.0%77.3%94.0%73.8%84.0%73 t/s$1.75$14400K125010ms
GPT-5.3 Codex SparkOpenAI80.0%77.3%94.0%1000 t/s128K
GPT-5.3 InstantOpenAI256K
GPT-5.2OpenAI80.0%47.6%100.0%92.4%91.0%86.7%71 t/s$1.75$14400K99790ms
GPT-4oOpenAI53.6%88.7%69.1%100 t/s$2.5$10128K
Gemini 2.5 ProGoogle67.2%25.3%88.0%86.4%82.0%29.4%$1.25$101M
Gemini 2.5 FlashGoogle350 t/s$0.3$2.51M
DeepSeek V3DeepSeek39.2%59.1%88.5%60 t/s$0.27$1.164K
DeepSeek R1DeepSeek79.8%71.5%90.8%$0.55$2.1964K
Mistral Large 2Mistral84.0%$2$6128K