AI

Grok 4.5 is in private beta at SpaceX and Tesla. So are its benchmarks.

xAI's newest model runs on a 1.5-trillion-parameter foundation and was tuned on Cursor coding data. Musk says it's “close to, perhaps exceeding Opus.” The only scoreboard is his.

N Noah · The Sharp Brief · July 5, 2026 · 3 min read
Engineers at a glowing workstation inside a rocket assembly facility

On June 28, Elon Musk announced that Grok 4.5 — xAI's newest model — is in private beta at SpaceX and Tesla. The specs he shared are real news: it's the first model built on xAI's new V9 foundation, at roughly 1.5 trillion parameters — about three times the size of the architecture behind the Grok that answers questions on X today — with coding data from Cursor folded into supplemental training. Then came the line built for headlines: early evals show performance “close to, perhaps exceeding Opus.”

Here's what you can independently verify about that claim: nothing. There's no system card, no public benchmark, no pricing, and no release date. The evidence is internal evaluations at two companies Musk controls, plus one early-access developer calling it “similar to Opus” — an anecdote, not a leaderboard. The model anyone can actually buy is still Grok 4.3, public since April 30 at $1.25 per million input tokens and $2.50 out, with a 1-million-token context window.

Grok 4.5 joins a crowded genre this summer: the announced-but-unavailable frontier model. OpenAI's GPT-5.6 family is gated behind a government-visible preview of roughly 20 vetted partners. Meta's “Watermelon” exists as a claim from a closed briefing. And Grok 5 — the ~6-trillion-parameter flagship training on Colossus 2 in Memphis — has slipped from late 2025 to Q1, to Q2, and is now a Q3 hope at best. The gap between “exists” and “available” is where the marketing lives.

The tell is the training data

Strip out the Opus talk and one detail carries real signal: Cursor. xAI supplemented Grok 4.5's training with data from the most popular AI coding environment, a deliberate shot at the coding market where Anthropic and OpenAI earn their margins — and where prices are already collapsing. Pair that with the deployment strategy: SpaceX's aerospace workflows and Tesla's vehicle software are live testbeds harder than any benchmark. Dogfooding at industrial scale is a structural advantage no leaderboard captures. Musk says from-scratch models will now ship monthly through year-end — a cadence no other lab has publicly committed to.

Our take: An unbenchmarked model doesn't beat anything — it can only out-tweet it. Vendor evals flatter the vendor, so “perhaps exceeding Opus” is an aspiration until a third party can run the scores. But don't dismiss the setup: a 3x-scaled foundation, Cursor-grade coding data, and two factories as a test harness is serious. If Grok 4.5 ships publicly anywhere near its claims at anything like Grok 4.3's prices, the coding-model price war gets a third front — and that's the part that hits your API bill.

What to watch

Advertisement

Get the day, decoded — at 7 PM ET

The Sharp Brief: AI, money, business & performance in five sharp minutes. Free.

Free bonus: subscribe today and The 2026 AI Playbook (PDF) lands with your welcome email.