pccx v002 Architecture¶

Parallel Compute Core eXecutor — Second Generation

pccx v002 is the second generation of a heterogeneous NPU architecture designed to accelerate autoregressive decoding of Transformer-based large language models (LLMs) on edge devices. It takes a hard look at the GEMM-centric v001 design and restructures the data path around a single shared activation bus: GEMV, CVO, and GEMM now pull activations from the same L2 cache, which sits at the geometric center of the floorplan. The result is shorter, more regular data movement across every compute path.

Overview

Overview

Hardware Architecture

Hardware Architecture

Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA)

Software Stack

Software Stack
- C API Overview

Target Models

Target Models

RTL Source

RTL Source Reference (v002)

Verification

Verification