Grok V9-Medium: 1.5 Trillion Parameters and Cursor Data to Chase Claude's Coding Crown
All blog articles
The model powering Grok right now runs on 0.5 trillion parameters. Its replacement triples that number. Grok V9-Medium, announced by Elon Musk on May 25, has finished training and targets a public release in mid-June 2026.
Grok V9-Medium: a 1.5T model built for code
Musk confirmed on X that xAI completed training on the model, which is three times bigger than the v8-small currently serving all Grok production traffic. Fine-tuning is underway, and reinforcement learning starts within days.
V9-Medium is not the flagship 6-trillion-parameter Grok 5. It's a separate model with a different focus: coding. When asked if it would handle code better, Musk's answer was blunt: "Much better at coding."
Cursor developer data fuels Grok V9-Medium training
Musk revealed that xAI incorporated extensive Cursor data during training. Cursor is the AI code editor used by developers at OpenAI, Stripe, and Perplexity. With over 4 million active developers, Cursor generates real engineering workflows: multi-file edits, large-codebase refactors, and correction cycles where developers accept, reject, or modify model suggestions.
SpaceXAI reached a deal with Anysphere (Cursor's parent) that includes an option to acquire the company for $60 billion later this year. That Cursor data pipeline is clearly a long-term integration play, not a one-off training decision.
Can Grok V9-Medium close the gap with Claude and ChatGPT?
Here's the uncomfortable truth. Grok holds roughly 6% enterprise AI adoption, compared with 55% for OpenAI, 47% for Anthropic, and 39% for Google. On SWE-bench Verified, Claude Opus leads at 87.6% while Grok 4 reaches 75%.
App downloads also tell a story: Grok fell from over 20 million in January to 8.3 million in April, a near-60% decline. The consumer momentum has stalled, and xAI badly needs a win on the developer front.
Grok V9-Medium's real bet: edit-heavy workflows
V9-Medium's advantage may live in edit-heavy developer workflows rather than from-scratch generation. If your workload is refactoring, review, and multi-file edits, it could be the more interesting test. That's a smart niche to target, given how Microsoft is building its own coding model for Copilot.
Raw parameters alone won't close a 12-point SWE-bench gap. But real developer workflow data from Cursor could be the kind of training signal that actually moves the needle. We'll know in a matter of days.