Use This Visualization
Ranked: The Smartest AI Models of 2026
See visuals like this from many other data creators on our Voronoi app. Download it for free on iOS or Android and discover incredible data-driven charts from a variety of trusted sources.
Key Takeaways
- Grok-4.20 Expert Mode and OpenAI GPT 5.4 Pro (Vision) tie for the top spot in TrackingAI’s April 2026 Mensa Norway benchmark, each scoring 145.
- The top tier is getting crowded, with several leading models now separated by only a few points.
- Scores have risen sharply from 2025, highlighting how quickly frontier AI reasoning has improved on visual pattern-recognition tests.
The race to build smarter AI models is getting tighter at the top.
This visualization, part of Visual Capitalist’s AI Week, sponsored by Terzo, ranks leading systems using data from TrackingAI, which benchmarks models on the Mensa Norway IQ test as of April 2026.
The results show both who leads today and how little now separates the top contenders, with multiple frontier models clustered near the top of the leaderboard.
A Tie at the Top
The ranking offers a snapshot of how today’s leading AI models perform on abstract pattern-recognition tasks, and just how close the race has become.
As the table below shows, only a small gap now separates the top models:
| Model | Mensa Norway IQ (April 2026) |
|---|---|
| Grok-4.20 Expert Mode | 145 |
| OpenAI GPT 5.4 Pro (Vision) | 145 |
| Gemini 3.1 Pro Preview | 141 |
| OpenAI GPT 5.4 Thinking (Vision) | 139 |
| OpenAI GPT 5.3 | 136 |
| Grok-4.20 Expert Mode (Vision) | 133 |
| OpenAI GPT 5.4 Thinking | 133 |
| Meta Muse Spark | 133 |
| Gemini 3.1 Pro Preview (Vision) | 132 |
| Qwen 3.5 | 130 |
| Claude-4.6 Opus | 130 |
| Kimi K2.5 | 127 |
| Manus | 115 |
| DeepSeek R1 | 112 |
| DeepSeek V3 | 111 |
| Gemini 3.1 Flash Preview | 110 |
| Llama 4 Maverick | 110 |
| OpenAI GPT 5.3 (Vision) | 109 |
| Claude-4.6 Sonnet | 106 |
| Bing Copilot | 101 |
| Perplexity | 97 |
| Mistral Medium 3.1 | 96 |
| Claude-4.6 Sonnet (Vision) | 94 |
| Claude-4.6 Opus (Vision) | 82 |
| Llama 4 Maverick (Vision) | 79 |
| OpenAI GPT 5.4 Pro | 73 |
The biggest takeaway is how compressed the top of the leaderboard has become. Grok-4.20 Expert Mode and OpenAI GPT 5.4 Pro (Vision) are tied for first at 145, while Gemini 3.1 Pro Preview follows closely at 141.
That narrow spread suggests frontier AI models are increasingly converging at the top, where a difference of just a few points can shift the rankings.
The gains from 2025 are also notable. Last year’s top score was 135, compared with 145 in this year’s results, highlighting the speed at which leading models are improving on this benchmark.
Not all models are keeping pace. Among major AI developers, Mistral’s top model ranks lowest in this dataset, scoring 97—well below the leading group.
How TrackingAI Runs the Test
TrackingAI uses the public Mensa Norway test, a set of 35 visual-pattern puzzles. For non-vision models, the questions are verbalized, while vision models receive the original images directly.
As a result, these results are best understood as a benchmark comparison—not a definitive measure of overall intelligence. Because the test is fundamentally visual, model scores can vary depending on how the questions are presented.
Why This Benchmark Matters
TrackingAI’s leaderboard is useful because it offers a simple, familiar way to compare reasoning performance over time. The site also notes that if a model refuses to answer, it is asked the same question up to 10 times, and the most recent answer is used for scoring.
Still, an IQ-style benchmark captures only one slice of capability. It does not measure everything that matters in real-world AI use, such as coding ability, factual reliability, tool use, or performance in professional domains.
Learn More on the Voronoi App
If you enjoyed today’s post, check out Global AI Adoption on Voronoi.