Target Hardware: Xilinx Kria KV260¶
KV260 is the primary target hardware platform for pccx.
Key specifications¶
FPGA fabric: Zynq UltraScale+ MPSoC (ZU5EV)
DSP slices: 1,248 DSP48E2
BRAM: 144 block RAMs (36 Kb each)
URAM: 64 UltraRAM blocks (288 Kb each)
Operating frequency: 400 MHz (target)
AXI interfaces: AXI-Lite (HPM), AXI HP ports 0–3, AXI ACP
Memory architecture¶
On KV260, pccx leverages the following memory hierarchy:
L2 URAM cache: 114,688 × 128-bit (feature map and intermediate result storage)
HP ports 0/1: Matrix core weight streaming (128-bit/clk)
HP ports 2/3: Vector core weight streaming (32 INT4/clk per port)
ACP port: Host DDR4 ↔ L2 cache DMA transfers
Resource utilization¶
Resource |
Used |
Budget |
|---|---|---|
DSP48E2 |
~1,088 |
1,248 |
BRAM (36 Kb) |
~140 |
144 |
URAM (288 Kb) |
~50 |
64 |
LUT |
~200K |
234K |
Note
The DSP48E2 estimate sums the 1,024-slice GEMM systolic array with the GEMV reduction stage-1 allocation (16 DSPs × 4 cores = 64). Including the SFU / CVO BF16 multipliers, post-synthesis utilisation is expected to land in the ~1,150–1,200 range and will be revised once the implementation flow completes.
All numbers above scale with configuration parameters (systolic array dimensions, number of GEMV / SFU cores, and so on).