Huawei Debuts CloudMatrix 384

Huawei Debuts CloudMatrix 384 Huawei has officially launched CloudMatrix 384, a massive AI computing system designed to rival Nvidia’s top-performing GB200 NVL72. This new system features 384 Ascend 910C chips and delivers higher total compute performance, more memory capacity, and a new all-optical interconnect setup. It’s already shipping to clients and marks a major step in China’s efforts to build local alternatives to U.S.-based AI hardware.

This article explains what CloudMatrix 384 is, how it compares to Nvidia’s solutions, and why it matters for AI infrastructure and sovereignty.

What Is CloudMatrix 384?

CloudMatrix 384 is a rack-scale AI system built by Huawei for high-end training workloads. It was revealed at the World Artificial Intelligence Conference (WAIC) in July 2025 and is based on 384 of Huawei’s in-house Ascend 910C NPUs. These chips are connected through an all-optical mesh interconnect called a “supernode,” allowing full communication between chips across the system.

It is designed for deep learning tasks such as large language model training, computer vision, and cloud-based AI services.

System-Level Highlights

384 Ascend 910C chips with dual dies
192 Kunpeng CPUs onboard
16 racks total: 12 for compute, 4 for networking
Full rack power consumption: ~559 kW
Target audience: Chinese cloud providers, government AI labs, and domestic enterprises

Huawei is already shipping units of the CloudMatrix 384 to clients in China, including its own Wuhu-based data centers.

Why It Matters

CloudMatrix 384 is Huawei’s answer to U.S. export controls, which have limited access to Nvidia’s cutting-edge GPUs like the H100 and GB200. This new platform gives Chinese firms a way to continue scaling AI without relying on foreign hardware.

Huawei has focused on system-level architecture to overcome per-chip performance gaps. While each Ascend 910C chip may be weaker than Nvidia’s B200, Huawei’s supernode design enables all 384 chips to operate as one unified cluster.

CloudMatrix 384 vs Nvidia GB200 NVL72

Metric	Huawei CloudMatrix 384	Nvidia GB200 NVL72
Chips	384 Ascend 910C	72 B200 GPUs
Compute Power (BF16)	~300 PFLOPs	~180 PFLOPs
Memory Capacity	3.6× higher than NVL72	Baseline
Memory Bandwidth	2.1× higher than NVL72	Baseline
Power Consumption	~559 kW	~140 kW
Cost per Unit	~$8.2 million	~$3 million
Power Efficiency	2.3× less efficient	More efficient

This comparison shows how Huawei is using scale and architecture to compete, even if its chips are individually less powerful.

How the Supernode Architecture Works

At the core of CloudMatrix 384 is the “supernode” optical mesh interconnect. This setup allows every NPU to talk to every other NPU directly, minimizing latency and speeding up model training. It supports full all-to-all communication, which is key for large-scale parallel computing.

Traditional systems often use multiple hops and slower links to connect different chips. Huawei’s supernode avoids that, creating a more unified compute fabric.

This innovation helps the CloudMatrix 384 outperform Nvidia’s NVL72 in several key system-level benchmarks.

Deployment and Ecosystem Challenges

Huawei has already shipped more than 10 units of CloudMatrix 384, with more deliveries expected in the coming months. However, its software ecosystem remains a concern. Nvidia’s CUDA platform is mature, widely supported, and easy for developers to adopt. Huawei’s alternatives are improving but still catching up.

Clients using CloudMatrix 384 must either adapt their models or develop software within Huawei’s framework, which can slow down adoption.

Table: Key Strengths and Trade-Offs of CloudMatrix 384

Feature	Advantage	Trade-Off
High Compute Performance	67% more system-level PFLOPs vs NVL72	Much higher power use
Memory and Bandwidth	3.6× memory, 2.1× bandwidth vs Nvidia	System is more expensive to build and run
Full Chip Interconnect	All-to-all optical mesh speeds up training	Complex design adds engineering overhead
Delivery Availability	Already shipping in China	No presence in international markets
Ecosystem Independence	Not reliant on U.S. chips or CUDA	Requires custom or ported software stack

This table helps clarify that while Huawei is gaining ground in hardware scale, it faces barriers in power efficiency and developer tools.

Strategic Context: Why Huawei Built This

The launch of CloudMatrix 384 is not just a tech move. It’s part of China’s broader plan for self-reliance in critical industries. Since U.S. restrictions began limiting Nvidia exports to China, Huawei and other firms have been under pressure to fill the gap.

With AI now central to economic, military, and scientific progress, access to reliable training infrastructure is key. CloudMatrix 384 provides a domestic option with comparable system-level capability.

It’s not perfect—power draw is very high, and cost is steep—but it gives China a viable alternative for AI scale-out.

Opportunities for AI Professionals

If you’re in AI infrastructure, compute engineering, or edge model deployment, platforms like CloudMatrix 384 represent a big shift. Understanding how these systems work—and how to develop for them—can set you apart.

You can start with a Deep Tech Certification to explore AI chips, system architecture, and interconnects. Or sharpen your skills with a Data Science Certification to build and train models that run efficiently on new hardware. For those in product or strategy, a Marketing and Business Certification can help position you in the evolving AI hardware market.

Final Takeaway

Huawei’s CloudMatrix 384 is a bold, hardware-heavy response to Nvidia’s AI dominance. With more compute power, bigger memory, and an innovative interconnect, it sets a new standard for China’s AI systems. While it consumes more power and costs more to run, it gives Huawei control over its compute future.

It’s not just a machine. It’s a signal that the global AI hardware race is going multi-polar—and Huawei plans to be at the front of that race.

Insight & Resources

Huawei Debuts CloudMatrix 384

What Is CloudMatrix 384?

System-Level Highlights

Why It Matters

CloudMatrix 384 vs Nvidia GB200 NVL72

How the Supernode Architecture Works

Deployment and Ecosystem Challenges

Table: Key Strengths and Trade-Offs of CloudMatrix 384

Strategic Context: Why Huawei Built This

Opportunities for AI Professionals

Final Takeaway

Follow us

Council

Resources

Policies

Contact

Policies

Certificate

Newly launched

Data Science

Virtual Reality

Artificial Intelligence (AI)

Programming Languages

Cyber Security

Internet of Things

Machine Learning (ML)