Huawei's openPangu 2.0 Is Now Open-Source — and It Was Trained Without a Single NVIDIA GPU

If you followed HDC 2026, watched Richard Yu open-source Pangu, or are weighing openPangu 2.0 against DeepSeek for 512K context and compliance-ready deployment, this article anchors on the June 30 Flash launch: event timeline, seven-component open-source roadmap, mHC/ModAttn architecture, Ascend hardware metrics, competitor comparison matrices, ModelArts/GitCode deployment paths, and a five-step Runbook.

Abstract visualization of neural network nodes and connections, representing MoE mixture-of-experts architecture and open-source ecosystem

Contents

1. Three Selection Pain Points: Open-Source Depth, Hardware Lock-In, and Context Length

  1. "Open source" is not always full-stack open. Most frontier models release weights and inference code only—pre-training, post-training, and custom training operators stay closed. You cannot reproduce the training pipeline or run domain-specific continued pre-training.
  2. Hardware binding and compliance. DeepSeek, Qwen, Kimi, and Llama were all trained on NVIDIA hardware. Under US export controls, teams that need a frontier model trained without any NVIDIA GPU currently have one option: openPangu 2.0.
  3. Context window drives use cases. Full contracts, large codebases, and marathon chat histories often exceed 128K. Both openPangu 2.0 variants ship a unified 512K window—roughly the text of eight full-length novels in one pass.

2. Event Background and Timeline: HDC 2026 to GitCode Launch

DateEvent
2026-06-12Huawei Developer Conference (HDC 2026), Dongguan Songshan Lake—Richard Yu keynote officially launches openPangu 2.0
2026-06-30openPangu-2.0-Flash weights, base inference code, and training/inference operators go open source on GitCode
2026-07 (planned)openPangu-2.0-Pro weights and inference code release
H2 2026 (planned)Pre-training code, post-training code (SFT/RLHF), and additional training operators roll out
At HDC 2026, Richard Yu said: "In the dictionary of my remaining life, there is no second place—only first. We will go from number one in China to number one in the world."

3. Two Versions for Different Scenarios

ProFlash
Total parameters505B92B
Active parameters18B6B
Sparsity ratio~28:1~15:1
Context window512K512K
Release statusJuly (planned)June 30 (live)

Flash: 92B total parameters with only 6B active—near the cost of a 6B dense model while drawing from a 92B knowledge pool. Single Ascend 910B card inference is supported; community estimates suggest ~96GB unified memory systems may also work.

Pro: 505B total, 18B active—built for extreme long-document workloads. The 512K window can ingest full contracts, large repositories, and extended conversation history in one shot.

4. Seven-Component Full-Stack Open Source: Why the Release Matters

Most open LLMs ship weights + inference code only. openPangu 2.0 plans to open seven major components:

  1. Model architecture (structure definition) — ✅ released
  2. Model weights (Flash live June 30; Pro planned July)
  3. Technical report — ✅ released with weights
  4. Inference code + training/inference operators — ✅ released
  5. Pre-training code — 📋 H2 2026
  6. Post-training code (SFT/RLHF) — 📋 H2 2026
  7. Training operators (Ascend high-performance custom kernels) — 📋 H2 2026

The last three are extremely rare at this MoE scale—enabling true full-stack open source. Researchers can reproduce training; enterprises can run vertical continued pre-training.

2026-06-30 ✅ Flash weights + inference code + operators 2026-07 🔜 Pro weights + inference code H2 2026 📋 Pre-training code, post-training code, more operators

5. Architecture Deep Dive

openPangu 2.0 uses a MoE (Mixture of Experts) design. Key techniques include:

Developer ecosystem and software stack

6. The First "No NVIDIA" Frontier Model: Ascend Hardware Adaptation

openPangu 2.0 is the first frontier-scale model fully trained on non-NVIDIA hardware—end to end on Huawei Ascend 910B NPUs, with no A100/H100 in the loop.

MetricData
Single-card throughput (Ascend) mainstream open-source models
Super-node training efficiency+30%
512K long-sequence training throughput+50%
Train/inference consistency>99% (a long-standing MoE pain point)
Inference latency1.2× better than comparable industry models
On-device 30B embedded model50% faster inference, 20% less memory; runs offline on Kirin chips
Flash-Int8 quantizationW4A8, 40% memory reduction, <10% accuracy loss

7. Competitor Comparison and Selection Matrix

Head-to-head parameters

ModelTotal paramsActive paramsContextTraining hardwareOpenness
openPangu 2.0 Pro505B18B512KAscend NPUFull stack (7 components)
openPangu 2.0 Flash92B6B512KAscend NPUFull stack (7 components)
DeepSeek V4 Pro1.6T~200B128KNVIDIAWeights + inference
Qwen 3.7 Max~400B+varies128KNVIDIAWeights + inference + partial training
Kimi K2.71T32B256KNVIDIAWeights + inference
Llama 4 405B405B128KNVIDIAWeights + inference

Capability matrix by scenario

ScenarioRecommendationWhy
Code generation / complex reasoningDeepSeek V4 Pro~200B active parameters, current performance leader
Agent / multi-tool orchestrationKimi K2.7Mature MCP ecosystem
Ultra-long documents (>256K tokens)openPangu 2.0 Pro512K context is the clear choice
Domestic compliance / sovereign AIopenPangu 2.0Only frontier model trained on purely domestic hardware
Ascend / Huawei Cloud deploymentopenPangu 2.0Native optimization, 2× throughput
On-device / mobile deploymentEmbedded 30BLocal inference on Kirin chips
Low-cost local inferenceFlash6B active, runnable on ~96GB VRAM

Note: Independent third-party benchmarks are still in progress; capability assessments below partly reflect architectural inference and will be updated when results publish.

8. Access and Deployment: ModelArts API and GitCode Self-Hosting

Option 1: Huawei Cloud ModelArts API (simplest)

  1. Create a Huawei Cloud account
  2. Open ModelArts → AI Gallery → search "openPangu 2.0"
  3. Subscribe to Flash or Pro and obtain the API endpoint
curl -X POST "https://modelarts.${REGION}.myhuaweicloud.com/v1/infers/openpangu-2-flash/chat/completions" \ -H "Content-Type: application/json" \ -H "X-Auth-Token: ${TOKEN}" \ -d '{ "model": "openpangu-2.0-flash", "messages": [{"role": "user", "content": "Hello, introduce yourself"}], "max_tokens": 1024, "temperature": 0.7 }'

Option 2: GitCode self-deployment

Repository hub: gitcode.com/org/ascend-tribe

# Flash single-card inference (Ascend 910B) python inference.py \ --model_path ./openPangu-Flash \ --device npu:0 \ --context_length 512000 \ --precision bf16 # Pro multi-card distributed inference python distributed_inference.py \ --model_path ./openPangu-Pro \ --num_devices 8 \ --context_length 512000 # LoRA domain fine-tuning python finetune.py \ --model_path ./openPangu-Pro \ --data_path ./domain_data \ --output_dir ./fine_tuned_model \ --method lora \ --lora_rank 16

Hardware requirements (reference)

VersionRecommended hardwareMinimum config
Flash (6B active)Single Ascend 910B~96GB unified memory
Flash-Int8Single Atlas A2~48GB VRAM
Pro (18B active)4+ Ascend 910B cardsMulti-card cluster (validate after July weight release)

9. Strategic Significance, HarmonyOS Agent, and openPangu License

10. Five-Step Getting Started Runbook

Step 1 — Define scenario and version

Ultra-long documents → Pro; low-cost API → Flash; compliance → either version; on-device → Embedded 30B.

Step 2 — Choose access path

No hardware: Huawei Cloud ModelArts API. Have Ascend: download weights from GitCode and self-host.

Step 3 — Configure the Ascend software stack

pip install torch_npu # Standard PyTorch code import torch import torch_npu model = model.to("npu:0")

Step 4 — Run inference or call the API

Flash single-card inference.py; quantized path → Flash-Int8; Pro multi-card distributed_inference.py.

Step 5 — Track the open-source roadmap and benchmark updates

Watch GitCode Ascend Tribe; update deployment notes when Pro lands in July; replace architectural inference once third-party scores publish.

11. Citable Technical Facts

12. Conclusion: Not an All-Round Champion, but Irreplaceable on Key Axes

DeepSeek V4 Pro still leads on code generation and hard reasoning, but openPangu 2.0 is nearly unmatched on 512K ultra-long context, sovereign domestic training, 2× Ascend-native throughput, full-stack open source, and HarmonyOS on-device integration. Flash weights went live June 30—right in the news cycle.

If you wire openPangu APIs from a laptop or generic Linux VPS, orchestrate HarmonyOS Agents, or run a multi-model gateway, long-running production setups often hit lid-close disconnects, missing Apple toolchains, and ops overhead. For 7×24 stable Agent workloads, OpenClaw gateways, and native iOS/macOS toolchains, renting a VPSMAC M4 Mac cloud node is the lower-friction path—swap models as the open ecosystem evolves while keeping a native macOS runtime stable.

Some benchmark figures in this article are architectural estimates; we will update when independent third-party results publish. Published: July 1, 2026.