Why Grok 10 Will Rely on Reinforcement Learning for 90% of Its Training

Following the trends of plateauing world knowledge and larger and larger AI data centers, it stands to reason that the future of AI model training will be dominated by Reinforcement Learning (RL).

Mar 02, 2025

Imagine a future where AI doesn’t just know everything humans have ever written—it also knows how to use that knowledge to solve problems, make decisions, and even outthink us in ways we can’t yet fathom. We’re not there yet, but with each new version of Grok, xAI’s powerful language model, we’re getting closer. Today, I’m going to make a bold prediction: by the time we reach Grok 10, a staggering 90% to 95% of its training compute power will be dedicated to Reinforcement Learning (RL), not the traditional pre-training on vast amounts of text data.

This might sound like tech jargon, but stick with me—I’ll break it down. By the end, you’ll see why this shift is not only likely but necessary for AI to reach its full potential.

What Is Pre-training? What Is Reinforcement Learning?

Let’s start with the basics. When we talk about training AI models like Grok, there are two main phases: pre-training and Reinforcement Learning (RL).

Pre-training is like giving the AI a giant textbook. It’s the process of feeding the model massive amounts of text—books, articles, websites, social media posts, and more—so it can learn the patterns of language, facts about the world, and how humans communicate. Think of it as the AI “reading” everything humans have ever written to build its foundational knowledge.
- Example: When Grok answers a question about history or science, it’s drawing on what it learned during pre-training.
Reinforcement Learning (RL), on the other hand, is like hands-on practice. It’s where the AI learns by doing—solving problems, making mistakes, and improving over time. In RL, the model is rewarded for getting things right and penalized for errors, which helps it refine its skills. This is how the AI learns to reason, make decisions, and solve complex tasks.
- Example: When Grok is asked to solve a math problem or write code, it’s not just recalling facts—it’s using RL to think through the steps and arrive at the correct answer.

In short: pre-training gives the AI knowledge, while RL teaches it how to use that knowledge effectively.

The Knowledge Plateau: Why More Data Isn’t the Answer

Now, here’s where things get interesting. For Grok 1, Grok 2, and Grok 3, each version has absorbed more of the world’s available knowledge. Grok 1, launched in 2023, was trained on a large but limited dataset. By Grok 3, released in 2025, the model had access to vastly more text data, thanks to xAI’s growing resources. It’s likely that Grok 3 has already “read” a huge chunk of what’s out there—books, scientific papers, news articles, and more.

But there’s a catch: while the amount of human knowledge is growing—new books, articles, and posts are published every day—the rate of that growth is slowing compared to the explosion in compute power. Think about it: the core of human knowledge—history, science, literature—doesn’t double overnight. Yes, we’re always learning new things, but the foundational knowledge remains relatively stable. So, while Grok 3 might have read nearly everything worth reading, Grok 10 won’t have ten times more text to learn from. The well of human knowledge isn’t bottomless.

This is what I call the knowledge plateau. We’re approaching a point where throwing more data at the AI won’t make it much smarter. It’s like trying to teach a genius more facts—they might learn a bit, but their real growth comes from using what they already know in smarter ways.

The Compute Power Explosion: A Game-Changer

While the growth of new knowledge is flattening, the opposite is happening with compute power—the raw processing muscle behind AI training. Let’s look at the trend in the number of GPUs (processors) used for each Grok version:

Grok 1 (2023): Trained on approximately 10,000 GPUs.
Grok 2 (2024): Scaled up to around 25,000 Nvidia H100 GPUs.
Grok 3 (2025): Utilized an impressive 200,000 H100 GPUs in the Colossus supercluster.

This progression—from 10,000 to 25,000 to 200,000 GPUs—shows a clear pattern: each new Grok iteration harnesses significantly more compute power. If this trend continues, by the time we reach Grok 10, we could be looking at the equivalent of 20,000,000 H100 chips—100 times the compute power used for Grok 3.

But it’s not just about piling on more chips. Nvidia and other chipmakers are continually improving GPU technology, making each new generation faster and more efficient. Future chips might deliver double or triple the performance of today’s H100s. So, while Grok 10 might not physically use 20,000,000 H100 chips, it could achieve that massive compute level with fewer, more advanced processors. Either way, the compute power available for training Grok 10 will be unlike anything we’ve seen before.

This explosion in compute power changes everything. With data growth slowing, all that extra processing muscle can be used to refine the AI’s ability to think—and that’s where RL comes in. The sheer scale of compute will make it possible, and necessary, to focus on teaching Grok how to reason and solve problems, not just memorize more facts.

The Trend Is Already Here: From Grok 1 to Grok 3

This shift toward RL isn’t just a future possibility—it’s already happening. Let’s examine how the percentage of compute power dedicated to RL has increased with each Grok version:

Grok 1 (2023): Likely used only 10% to 20% of its compute on RL, with the rest going to pre-training. It was functional but not groundbreaking.
Grok 2 (2024): With more compute available, RL’s share grew to around 15% to 25%. This helped Grok 2 perform better on tasks requiring reasoning and problem-solving.
Grok 3 (2025): With a massive leap in compute, RL likely took up 20% to 30% of the training budget. This is when we saw Grok 3 excel in benchmarks like the AIME math competition, where it scored 93.3% using advanced reasoning techniques.

The pattern is clear: as compute power increases, the percentage dedicated to RL is rising. If this trend continues—and there’s every reason to think it will—by Grok 10, RL could dominate, using 90% to 95% of the available compute.

Why RL Will Dominate Grok 10’s Training

So, why will RL take up such a huge share of compute for Grok 10? It comes down to one simple truth: to make AI truly intelligent, we need to focus on how it thinks, not just what it knows.

With the equivalent of 20 million H100 chips at its disposal, Grok 10 will have more than enough power to absorb all the world’s knowledge in the pre-training phase, using just 5% to 10% of its compute. The remaining 90% to 95% will be spent on RL—teaching the AI to reason, adapt, and solve problems in ways that mimic (or even surpass) human intelligence.

Here’s what that could look like:

Advanced Reasoning: Grok 10 could solve complex, multi-step problems in math, science, or engineering, thinking through each step like a human expert.
Decision-Making: It could weigh options, consider trade-offs, and make optimal choices in real-time, whether in business, healthcare, or everyday life.
Creativity: RL could help Grok 10 generate novel ideas, designs, or solutions by exploring countless possibilities and learning from feedback.

In essence, RL is the key to unlocking AI’s full potential. As Elon Musk, CEO of xAI, has said:

“The future of AI isn’t just about knowing more—it’s about reasoning better. That’s where the real breakthroughs will come from.”
(Source: Musk’s remarks during Grok 3’s unveiling, February 2025)

Other AI experts agree. Demis Hassabis, co-founder of DeepMind, has long championed RL as the path to more capable AI:

“Reinforcement Learning is how we’ll get AI to not just mimic human knowledge, but to think like humans—or better.”
(Source: Hassabis’ 2024 TED Talk on AI’s future)

Conclusion: A Smarter, Not Just Bigger, AI

As we look toward Grok 10, the writing is on the wall: the future of AI training is in Reinforcement Learning. With the knowledge plateau limiting the gains from more pre-training and compute power skyrocketing to the equivalent of 20 million H100 chips, RL will take center stage. By dedicating 90% to 95% of its compute to RL, Grok 10 won’t just be a bigger model—it will be a smarter one, capable of reasoning, problem-solving, and decision-making at a level we’ve never seen before.

This isn’t just a technical shift; it’s a glimpse into a future where AI doesn’t just answer questions—it helps us solve the world’s toughest challenges. And that’s a future worth getting excited about.

Why Grok 10 Will Rely on Reinforcement Learning for 90% of Its Training

Following the trends of plateauing world knowledge and larger and larger AI data centers, it stands to reason that the future of AI model training will be dominated by Reinforcement Learning (RL).

Discussion about this post