Contents Menu Expand Light mode Dark mode Auto light/dark, in light mode Auto light/dark, in dark mode Skip to content
English version · 한국어로 보기 →
pccx
pccx
EN · 한국어
RTL Lab Docs Blog

Introduction

  • pccx: Parallel Compute Core eXecutor

Roadmap

  • Roadmap (Two-Track)

v002 Architecture

  • pccx v002 Architecture
    • Overview
    • Hardware Architecture
      • Design Rationale: v001 → v002
      • Top-Level Architecture
      • Physical Floorplan
      • Memory Hierarchy
      • KV Cache Optimization Strategy
      • GEMM Core (Systolic Array)
      • GEMV Core
      • SFU Core (Complex Vector Operations)
      • DSP48E2 W4A8 Bit Packing and Sign Recovery
    • Instruction Set Architecture (ISA)
      • Instruction Encoding
      • Per-Instruction Encoding
      • Per-Instruction Dataflow
    • Software Stack
      • C API Overview
    • Target Models
      • Gemma 3N E4B — Overview
      • Gemma 3N E4B — Operator-Level Pipeline
      • Gemma 3N — Attention and RoPE Constraints
      • Gemma 3N — LAuReL and PLE Calibration Modules
      • Gemma 3N — FFN Gaussian Top-K Sparsity
      • Gemma 3N E4B on pccx v002 — Execution and Scheduling
    • RTL Source Reference (v002)
      • ISA Type Package
      • NPU Top-Level
      • Compute Core Modules
      • NPU Controller Modules
    • Verification

Target Hardware

  • Devices
    • Target Hardware: Xilinx Kria KV260

Archive

  • Archive
    • Archive: v001 Experimental Architecture
      • pccx: Parallel Compute Core eXecutor
      • pccx ISA Specification
      • pccx ISA Spreadsheet View
      • Developer Reference for pccx v001 Host API
      • RTL Source Reference (v001)
        • Top level
        • Packages and Constants
        • NPU Controller
        • Matrix Core (GEMM)
        • Vector Core (GEMV)
        • CVO Core (SFU)
        • Memory Control
        • Preprocess
        • Library
        • Host API (C driver)

Toolchain Demos

  • Toolchain Demos
    • Mermaid — NPU block diagram
    • WaveDrom — AXI4 read transaction
    • SVG — themed 4×4 PE array
    • scienceplots — bandwidth vs batch size
    • Plot gallery
      • Batch size vs achieved HP-AXI bandwidth

Tools

  • pccx-lab — Simulator & AI Profiler
Back to top
View this page
Edit this page

Archive¶

  • Archive: v001 Experimental Architecture
    • Project Overview
    • Quick Menu
    • Quantization Strategy: W4A16 with BF16 Activations
    • Compute Engines
Next
Archive: v001 Experimental Architecture
Previous
Target Hardware: Xilinx Kria KV260
Copyright © 2026, hwkim
Made with Furo
Last updated on 2026-04-19
RTL Lab Docs Blog