Instruction Encoding¶
1. Format Overview¶
Every pccx v002 instruction is fixed 64 bits long and follows this top-level layout.
Bits |
Field |
Description |
|---|---|---|
|
opcode |
4 bits. Up to 16 opcodes; 5 in use today. |
|
instruction body |
60 bits. Per-opcode layouts are detailed in §3. |
2. Opcode Table¶
Opcode |
Mnemonic |
Function |
Primary fields |
|---|---|---|---|
|
GEMV |
Matrix × vector |
dest_reg, src_addr, flags, shape_ptr, size_ptr, parallel_lane |
|
GEMM |
Matrix × matrix |
Identical layout to GEMV |
|
MEMCPY |
Device-to-device data movement |
from_device, to_device, dest_addr, src_addr, aux_addr, shape_ptr, async |
|
MEMSET |
Constant Cache write |
dest_cache, dest_addr, a/b/c_value |
|
CVO |
Complex Vector Op (SFU) |
cvo_func, src_addr, dst_addr, length, flags, async |
|
reserved |
— |
Reserved for future extensions |
3. Decode Flow¶
The host writes instructions into the AXI-Lite CMD_IN FIFO. The decode path is:
flowchart TB
CMD[[AXI-Lite<br/>CMD_IN FIFO]] --> DEC["Decoder<br/>(ctrl_npu_decode)<br/>opcode[63:60] branch"]
DEC -->|control μop<br/>memory μop<br/>CVO μop| DISP["Dispatcher<br/>(ctrl_dispatcher)"]
GS[[Global Scheduler<br/>dependency / hazard check]] -.-> DISP
DISP --> GEMM[GEMM ctrl]
DISP --> GEMV[GEMV ctrl]
DISP --> MC[MEMCTRL<br/>ACP / NPU]
DISP --> MS[MEMSET<br/>Constant]
DISP --> CVO[CVO ctrl]
4. μop Decomposition¶
Inside the dispatcher, each opcode decomposes into the following μops.
μop type |
Fields |
|---|---|
|
flags (6 bit) + size_ptr_addr (6 bit) + parallel_lane (5 bit) |
|
Same layout as |
|
data_route (8 bit) + dest_addr + src_addr + shape_ptr + async |
|
dest_cache (2 bit) + dest_addr + a/b/c_value (16 bit × 3) |
|
cvo_func (4 bit) + src_addr + dst_addr + length + flags + async |
5. Memory Routing Encoding¶
The from_device + to_device pair in MEMCPY is expanded inside the
Control Unit into an 8-bit route enum (data_route_e).
Route |
Encoding |
Path |
|---|---|---|
|
|
Host DDR4 → L2 cache |
|
|
L2 cache → host DDR4 |
|
|
L2 → GEMM L1 |
|
|
L2 → GEMV L1 |
|
|
L2 → SFU |
|
|
GEMM result → L2 |
|
|
GEMV result → L2 |
|
|
SFU result → L2 |
6. Pointer / Parameter Registers¶
The ISA uses 6-bit pointers to index small entries in the Constant Cache.
Pointer |
Width |
Content |
|---|---|---|
|
6 bit |
Tensor shape metadata (M, N, K). |
|
6 bit |
Tile sizes, loop bounds, etc. |
|
6 bit |
Index into the 64-entry Constant Cache. |
Pointer entries are preloaded by MEMSET (see the MEMSET section of Per-Instruction Dataflow).
7. Address Space¶
Field |
Width |
Address space |
|---|---|---|
|
17 bit |
128 K entries (indexed by L2 cache block). |
|
17 bit |
MEMCPY auxiliary address (e.g., host DDR offset). |
Entry size is defined at the device layer (128 bit per word on KV260), so 17 bit × 16 byte yields a 2 MB linear L2 address space.