China’s Z.AI Releases GLM-5.2: A Model That Rivals Claude Opus—Using Zero Nvidia Chips

In brief

GLM-5.2 trails Claude Opus 4.8 by just 1% on FrontierSWE—a benchmark measuring multi-hour autonomous engineering projects—while beating GPT-5.5 on the same test. It ships under an MIT license with zero regional restrictions.
The model was built entirely on Huawei Ascend chips with no NVIDIA hardware involved.
Unsloth AI already released 2-bit GGUF quantizations that shrink the model from 1.51TB to 238GB. You’ll still need 256GB of RAM or VRAM—but at that point, you can run it.

Z.ai dropped GLM-5.2 on June 16, promising top level performances, beating its already advanced GLM 5.1.

The Beijing-based lab, which has been on the U.S. Entity List since January 2025, appears to be benefiting from growing concerns over America’s approach to AI. Over the past week, the ban on Anthropic Fable and the release of this new model have helped drive zAI’s stock up 90%, sending it to a new all-time high.

GLM 5.2 has the numbers to back up the hype.

On FrontierSWE—a benchmark that evaluates whether an AI agent can complete open-ended technical projects measured in hours, covering systems optimization, large-scale code construction, and applied ML research, scored by dominance rate—GLM-5.2 hit 74.4 against Claude Opus 4.8’s 75.1. It edged out GPT-5.5 at 72.6. On SWE-bench Pro, which tests autonomous resolution of real-world GitHub issues scored as a pass rate, GLM-5.2 scored 62.1 to GPT-5.5’s 58.6—and cleared its predecessor GLM-5.1’s 58.4 by a wide margin.

The quality jump makes it the best open-source model to date in the Artificial Analysis Intelligence Index, which aggregates the results of 9 different scores to assess the general quality of an AI model. OpenRouter’s benchmarks put it in the same category as the now banned Claude Fable 5.

The hardware used to achieve this feat is another interesting part of the story. GLM-5.2 was trained on Huawei Ascend chips—no Nvidia anywhere in the pipeline. Emad Mostaque, founder of Stability AI, estimated total training costs at around $25 million, 80% of that in post-training, which would make it extremely cheap when compared against its peers.

As Decrypt reported earlier this year, Z.ai was already training image models on Huawei’s Ascend Atlas servers without a single American chip. GLM-5.2 takes that infrastructure further—a 744-billion-parameter mixture-of-experts model with a genuine 1 million-token context window, five times the 200K limit on GLM-5.1, and an MIT license that means no government directive can flip the access switch.

Tokens are the chunks of tet a model can read and generate whereas Parameters are the number of internal settings and values that determine how a model processes information and generates responses

Who it’s for and what it costs

For developers, the context window is the operational shift. Whole-repo navigation, multi-file refactors, and long agentic pipelines that previously required chunking become single-call workflows. API pricing runs $1.40 per million input tokens and $4.40 per million output—against Claude Opus 4.8’s $5 input and $25 output. The Coding Plan starts at around $18 a month and works directly inside Claude Code, Cline, Kilo Code, and most popular agentic environments.

Local deployment is also technically possible. Unsloth AI pushed 2-bit GGUF quantizations that compress the model from 1.51TB down to 238GB while retaining ~82% accuracy.

Don’t get too excited, though. That still means it demands 256GB of unified memory or a matching RAM/VRAM combo—a maxed M4 Ultra Mac Studio or a workstation with a mid-range GPU and 256GB of system RAM with mixture-of-experts offloading. It’s still a lot of money, but at least something that you can buy and run on your house if you really want to.

We ran a quick test, asking GLM-5.2 to build our standard game mixing typing mechanics with a shooter. The UI wasn’t the prettiest—other models generated more polished-looking interfaces, but the experience was the most varied: different scenarios across waves, enemy types that shifted, bosses appearing later in the run.

It generated more diverse game states than anything else we tested for the same task in a zero shot setup.

If you want to play it, it’s live in our Itch.io profile.

That variance points toward where GLM-5.2 makes the most economic sense. For multi-shot generation workflows and agentic pipelines where output diversity matters more than polish, the math at open-source pricing levels is hard to argue with. For the hardest sustained tasks—SWE-Marathon, where it scores 13.0 against Opus 4.8’s 26.0—the gap to the closed frontier is still real, and 13 points wide.

Open-source weights are live on HuggingFace under the MIT license. The quantized weights are also available on HuggingFace. GLM Coding Plan subscribers can switch now with the model string GLM-5.2, and it’s also available for free testing on z.AI with some usage constraints.