Huawei has officially launched CloudMatrix 384, a massive AI computing system designed to rival Nvidia’s top-performing GB200 NVL72. This new system features 384 Ascend 910C chips and delivers higher total compute performance, more memory capacity, and a new all-optical interconnect setup. It’s already shipping to clients and marks a major step in China’s efforts to build local alternatives to U.S.-based AI hardware.
This article explains what CloudMatrix 384 is, how it compares to Nvidia’s solutions, and why it matters for AI infrastructure and sovereignty.
What Is CloudMatrix 384?
CloudMatrix 384 is a rack-scale AI system built by Huawei for high-end training workloads. It was revealed at the World Artificial Intelligence Conference (WAIC) in July 2025 and is based on 384 of Huawei’s in-house Ascend 910C NPUs. These chips are connected through an all-optical mesh interconnect called a “supernode,” allowing full communication between chips across the system.
It is designed for deep learning tasks such as large language model training, computer vision, and cloud-based AI services.
System-Level Highlights
- 384 Ascend 910C chips with dual dies
- 192 Kunpeng CPUs onboard
- 16 racks total: 12 for compute, 4 for networking
- Full rack power consumption: ~559 kW
- Target audience: Chinese cloud providers, government AI labs, and domestic enterprises
Huawei is already shipping units of the CloudMatrix 384 to clients in China, including its own Wuhu-based data centers.
Why It Matters
CloudMatrix 384 is Huawei’s answer to U.S. export controls, which have limited access to Nvidia’s cutting-edge GPUs like the H100 and GB200. This new platform gives Chinese firms a way to continue scaling AI without relying on foreign hardware.
Huawei has focused on system-level architecture to overcome per-chip performance gaps. While each Ascend 910C chip may be weaker than Nvidia’s B200, Huawei’s supernode design enables all 384 chips to operate as one unified cluster.
CloudMatrix 384 vs Nvidia GB200 NVL72
Metric | Huawei CloudMatrix 384 | Nvidia GB200 NVL72 |
Chips | 384 Ascend 910C | 72 B200 GPUs |
Compute Power (BF16) | ~300 PFLOPs | ~180 PFLOPs |
Memory Capacity | 3.6× higher than NVL72 | Baseline |
Memory Bandwidth | 2.1× higher than NVL72 | Baseline |
Power Consumption | ~559 kW | ~140 kW |
Cost per Unit | ~$8.2 million | ~$3 million |
Power Efficiency | 2.3× less efficient | More efficient |
This comparison shows how Huawei is using scale and architecture to compete, even if its chips are individually less powerful.
How the Supernode Architecture Works
At the core of CloudMatrix 384 is the “supernode” optical mesh interconnect. This setup allows every NPU to talk to every other NPU directly, minimizing latency and speeding up model training. It supports full all-to-all communication, which is key for large-scale parallel computing.
Traditional systems often use multiple hops and slower links to connect different chips. Huawei’s supernode avoids that, creating a more unified compute fabric.
This innovation helps the CloudMatrix 384 outperform Nvidia’s NVL72 in several key system-level benchmarks.
Deployment and Ecosystem Challenges
Huawei has already shipped more than 10 units of CloudMatrix 384, with more deliveries expected in the coming months. However, its software ecosystem remains a concern. Nvidia’s CUDA platform is mature, widely supported, and easy for developers to adopt. Huawei’s alternatives are improving but still catching up.
Clients using CloudMatrix 384 must either adapt their models or develop software within Huawei’s framework, which can slow down adoption.
Table: Key Strengths and Trade-Offs of CloudMatrix 384
Feature | Advantage | Trade-Off |
High Compute Performance | 67% more system-level PFLOPs vs NVL72 | Much higher power use |
Memory and Bandwidth | 3.6× memory, 2.1× bandwidth vs Nvidia | System is more expensive to build and run |
Full Chip Interconnect | All-to-all optical mesh speeds up training | Complex design adds engineering overhead |
Delivery Availability | Already shipping in China | No presence in international markets |
Ecosystem Independence | Not reliant on U.S. chips or CUDA | Requires custom or ported software stack |
This table helps clarify that while Huawei is gaining ground in hardware scale, it faces barriers in power efficiency and developer tools.
Strategic Context: Why Huawei Built This
The launch of CloudMatrix 384 is not just a tech move. It’s part of China’s broader plan for self-reliance in critical industries. Since U.S. restrictions began limiting Nvidia exports to China, Huawei and other firms have been under pressure to fill the gap.
With AI now central to economic, military, and scientific progress, access to reliable training infrastructure is key. CloudMatrix 384 provides a domestic option with comparable system-level capability.
It’s not perfect—power draw is very high, and cost is steep—but it gives China a viable alternative for AI scale-out.
Opportunities for AI Professionals
If you’re in AI infrastructure, compute engineering, or edge model deployment, platforms like CloudMatrix 384 represent a big shift. Understanding how these systems work—and how to develop for them—can set you apart.
You can start with a Deep Tech Certification to explore AI chips, system architecture, and interconnects. Or sharpen your skills with a Data Science Certification to build and train models that run efficiently on new hardware. For those in product or strategy, a Marketing and Business Certification can help position you in the evolving AI hardware market.
Final Takeaway
Huawei’s CloudMatrix 384 is a bold, hardware-heavy response to Nvidia’s AI dominance. With more compute power, bigger memory, and an innovative interconnect, it sets a new standard for China’s AI systems. While it consumes more power and costs more to run, it gives Huawei control over its compute future.
It’s not just a machine. It’s a signal that the global AI hardware race is going multi-polar—and Huawei plans to be at the front of that race.