Unified view of model scores across reasoning, multimodal, coding, long-context, and agentic search benchmarks.
| Provider | Reasoning & Knowledge - HLE-Full | Reasoning & Knowledge - HLE-Full (w/ tools) | Reasoning & Knowledge - AIME 2025 | Reasoning & Knowledge - HMMT 2025 (Feb) | Reasoning & Knowledge - IMO-AnswerBench | Reasoning & Knowledge - GPQA-Diamond | Reasoning & Knowledge - MMLU-Pro | Image & Video - MMMU-Pro | Image & Video - CharXiv (RQ) | Image & Video - MathVision | Image & Video - MathVista (mini) | Image & Video - ZeroBench | Image & Video - ZeroBench (w/ tools) | Image & Video - OCRBench | Image & Video - OmniDocBench 1.5 | Image & Video - InfoVQA (val) | Image & Video - SimpleVQA | Image & Video - WorldVQA | Image & Video - VideoMMMU | Image & Video - MMVU | Image & Video - MotionBench | Image & Video - VideoMME | Image & Video - LongVideoBench | Image & Video - LVBench | Coding - SWE-Bench Verified | Coding - SWE-Bench Pro | Coding - SWE-Bench Multilingual | Coding - Terminal Bench 2.0 | Coding - PaperBench | Coding - CyberGym | Coding - SciCode | Coding - OJBench (cpp) | Coding - LiveCodeBench (v6) | Long Context - Longbench v2 | Long Context - AA-LCR | Agentic Search - BrowseComp | Agentic Search - BrowseComp (w/ctx manage) | Agentic Search - BrowseComp (Agent Swarm) | Agentic Search - WideSearch (item-f1) | Agentic Search - WideSearch (item-f1 Agent Swarm) | Agentic Search - DeepSearchQA | Agentic Search - FinSearchCompT2&T3 | Agentic Search - Seal-0 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Kimi K2.5 (Thinking) | 30.1 | 50.2 | 96.1 | 95.4 | 81.8 | 87.6 | 87.1 | 78.5 | 77.5 | 84.2 | 90.1 | 9 | 11 | 92.3 | 88.8 | 83.5 | 83.9 | 47.0 | 70.0 | 77.5 | 61.8 | 72.0 | 54.1 | 54.5 | 71.6 | 49.5 | 56.0 | 40.4 | 61.4 | 32.6 | 68.5 | 80.0 | 77.1 | 60.0 | 93.5 | 60.6 | 74.9 | 78.4 | 68.4 | 73.0 | 58.8 | 62.0 | 46.4 |
| GPT-5.2 (xhigh) | 34.5 | 45.5 | 100 | 99.4 | 86.3 | 92.4 | 86.7* | 79.5* | 82.1 | 83.0 | 82.8* | 9* | 7* | 80.7* | 85.7 | 79.8* | 83.7* | 41.0* | 73.4* | 82.0* | 61.8* | 74.4 | 55.6* | 52.5* | 68.8 | 45.5 | 50.0 | 29.5 | 60.0 | 24.9 | 55.0 | 71.0 | 71.7 | 63.4* | 92.9* | 65.8 | 57.8 | — | 67.5 | — | 57.4 | 52.3 | 38.9 |
| Claude 4.5 Opus (Extended Thinking) | 30.8 | 43.2 | 92.8 | 92.9* | 78.5* | 87.0 | 89.3* | 74.0 | 67.2* | 77.1* | 80.2* | 3* | 9* | 86.5* | 87.7* | 76.5* | 78.7* | 38.7* | 63.0* | 73.2* | 56.3* | 67.6 | 44.2* | 41.8* | 74.0 | 48.0 | 44.0 | 35.3 | 57.0 | 22.1 | 60.0 | 75.0 | 74.1 | 50.7* | 86.9* | 37.0 | 59.2 | — | 60.9 | — | 46.7 | 49.8 | 35.9 |
| Gemini 3 Pro (High Thinking Level) | 37.5 | 45.8 | 95.0 | 97.3* | 83.1* | 91.9 | 90.1 | 81.0 | 81.4 | 86.1* | 89.8* | 8* | 12* | 90.3* | 88.5 | 82.4* | 83.4* | 41.0* | 76.5* | 82.7* | 64.1* | 77.2 | 55.2* | 56.4* | 74.5 | 54.0 | 58.0 | 39.5 | 69.0 | 34.2 | 69.0 | 77.0 | 79.8 | 65.2* | 89.4* | 37.8 | 67.6 | — | 68.0 | — | 61.0 | 55.6 | 47.6 |
| DeepSeek V3.2 (Thinking) | 25.1† | 40.8† | 93.1 | 92.5 | 78.3 | 82.4 | 85.0 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 57.6 | 37.0 | 39.8 | 28.2 | 48.8 | 20.4 | 56.2 | 69.3 | 73.3 | 52.9 | 86.3 | 51.4 | — | — | 61.3 | — | 40.6 | 48.5 | 34.4 |
| Claude Opus 4.6 (Adaptive) | — | — | — | — | — | 91.3 | — | 73.9 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 80.8 | — | 77.8 | 65.4 | — | 66.6 | — | — | — | — | — | — | — | 86.8 | — | — | 91.3 | — | — |
| GPT-5.3-Codex (xhigh) | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 56.8 | — | 77.3 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| Qwen3-VL-235B-A22B (Thinking) | — | — | — | — | — | — | — | 69.3 | 66.1 | 74.6 | 85.8 | 4* | 3* | 87.5 | 82.0* | 73.8 | 77.5 | 39.4 | 64.5 | 69.0 | 52.9 | 59.9 | 46.7 | 39.3 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| Qwen3.5-Plus | 28.7 | 48.3 | — | 94.8 | 80.9 | — | 87.8 | 79.0 | 80.8 | 88.6 | 90.3 | 12 | — | 93.1 | 90.8 | — | 67.1 | — | 84.7 | 75.4 | — | 83.7 | — | 75.5 | 76.4 | — | 69.3 | 52.5 | — | — | — | — | 83.6 | 63.2 | 68.7 | 69.0 | 78.6 | — | 74.0 | — | — | — | 46.9 |