V4 on Ascend

DeepSeek previewed V4 this morning, and the interesting part is not the model. It is the stack underneath it.

The headline numbers are respectable. Two variants: V4 Pro at 1.6 trillion parameters and V4 Flash at 284 billion, both mixture-of-experts, both with a 1 million token context window, up from 128,000 in V3. DeepSeek has named its long-context trick "Hybrid Attention Architecture" and claims world-leading cost efficiency at that window size. Pricing is the familiar undercut-the-frontier gambit. Flash is $0.14 per million input tokens and $0.28 per million output, which sits below Haiku 4.5, Gemini 3.1 Flash, and GPT-5.4 Nano. Pro tops out at $0.145 in and $3.48 out, cheaper than Opus 4.7, GPT-5.5, and Gemini 3.1 Pro. On the company's own benchmarks, V4 Pro is claimed to compete with Claude Sonnet 4.5 on agentic coding and approach Opus 4.5. Benchmarks from the vendor are benchmarks from the vendor, but the last time DeepSeek made a claim like this it was broadly true.

None of that is the story.

The story is that Huawei announced, the same morning, that its entire Ascend supernode product line supports V4. Ascend 950 clusters, stitched together with Huawei's Supernode interconnect, running a frontier-class open-weight model. Cambricon pre-announced compatibility within hours. DeepSeek has not disclosed what hardware it used to train V4, and that silence is doing a lot of work, but the inference layer is now explicitly, publicly, and aggressively domestic.

This is what the export controls were meant to prevent, and it is what they arguably accelerated. Nvidia H100s and H200s got harder to ship to China in 2022, harder still in 2023, then harder again. The policy assumption was that constraining the chips would constrain the models. Instead it constrained the supply chain around Huawei, and Beijing poured money into closing the gap. The gap is not closed, Ascend 950 is not an H200, but it is close enough for a 1.6 trillion parameter MoE to run coherently across it, and "close enough" is what matters once a single Chinese lab can ship a competitive open-weight model end-to-end on domestic silicon. That is a political fact, not a technical one. It will be cited in policy papers for the next decade.

The open-weight angle is the other shoe. V4 ships on Hugging Face with weights anyone can download. Americans kept their frontier models closed for commercial reasons that made perfect sense at the time; the result, a year after R1 rattled Silicon Valley, is that the cheapest capable model in every size class is Chinese, the best open-weight model in every size class is Chinese, and now the hardware story is Chinese too. Anthropic and OpenAI have spent the last six months accusing DeepSeek of distilling their outputs to train on, and that may well be true, but it is also a slightly desperate argument to be making at this stage. The ship has sailed. The ship was built somewhere else.

I was skeptical about how much V4 would move the needle given how long the wait had stretched, and how many Chinese competitors, MiniMax included, had shipped credible frontier work in the meantime. The model itself might end up being an incremental update. The hardware announcement is not incremental. It is the first time a Chinese lab has said, on the day of a major release, that its production inference runs on chips Washington cannot block.

That changes the shape of the next two years of this race, more than any specific benchmark would.

Sources:

China's AI darling DeepSeek previews new model adapted for Huawei chip technology — Reuters
DeepSeek unveils next-gen AI model as Huawei vows 'full support' with new chips — South China Morning Post
China's DeepSeek previews new AI model a year after jolting US rivals — The Verge
DeepSeek previews new AI model that 'closes the gap' with frontier models — TechCrunch
DeepSeek Unveils Flagship AI Model a Year After Breakthrough — Bloomberg

Plutonic Rainbows

V4 on Ascend

Recent Entries