A B C D E F G H I J K L M N O P Q R S T U V W
1
BIT 0 64 bit
2
Flags
3
BRAM REGISTER [B] L1 cache [A] Find Emax and Align
4
Scale REG [B] [1] REG [B] [1] weight systolic top [A] [1] Calculation result is both result & accumulated result
5
weight VdotM top [A] [1] result * (Weight's scale)
6
L2 cache REGISTER [C] VdotM top [A] [2]
7
addr len 17bit bit Mega bit VdotM top [A] [3]
8
14,680,064 14.68... VdotM top [A] [4]
9
10
11
GEMV BIT 4 17 17 6 6 6 5 3 0
12
opcode Dest Address (17bit) src Address (17bit) Flags size (pointer)address constant (pointer)address parallel Lane
13
GEMV L2 cache address L2 cache address Find Emax Align size of weight RAM's
address (pointer)
| 0 ~ 512 | 0x0040 |
shape RAM's
address (pointer)
| 0 ~ 64 | 0x0040 |
v dot m{0, 1, 2, 3}
14
ACCM
15
w * scale
16
fmap size update
17
18
GEMV
API
GEMV GEMV 0 ~ 0x1FFFF 0 ~ 0x1FFFF Find Emax Align 0 ~ 0x0040 0 ~ 0x0040 0
1
2
3
19
ACCM
20
w * scale
21
fmap size update
22
23
24
25
26
GEMM BIT 4 17 17 6 6 6 5 3 0
27
opcode Dest Address (17bit) src Address (17bit) Flags size (pointer)address shape (pointer)address parallel Lane reserved
28
GEMM L2 cache address L2 cache address Find Emax Align size of weight RAM's
address (pointer)
| 0 ~ 512 | 0x0040 |
shape RAM's
address (pointer)
| 0 ~ 64 | 0x0040 |
v dot m{0, 1, 2, 3}
29
ACCM
30
w * scale
31
fmap size update
32
33
GEMM
API
GEMM GEMM 0 ~ 0x1FFFF 0 ~ 0x1FFFF Find Emax Align 0 ~ 0x0040 0 ~ 0x0040 0
1
2
3
34
ACCM
35
w * scale
36
fmap size update
37
38
39
40
41
memcpy BIT 4 1 1 17 17 17 6 1 0
42
Opcode From device To devide Dest Address (17bit) src Address (17bit) Address (17bit) shape (pointer)address async
43
memcpy From NPU L2 To NPU (L2) L2 cache address L2 cache address L2 cache address shape RAM's
address (pointer)
| 0 ~ 64 | 0x0040 |
1/0
44
To CPU
45
46
memcpy
API
memcpy_div_2_div memcpy 1 1/0 0 ~ 0x1FFFF 0 ~ 0x1FFFF 0 ~ 0x1FFFF 0 ~ 0x0040 1
47
48
memcpy_div_2_host
49
50
memcpy_host_2_div
51
52
memcpy_div_2_div_async
53
54
memcpy_div_2_host_async
55
56
memcpy_host_2_div_async
57
58
memcpy_2D
59
60
memcpy_2D_async
61
62
memcpy_3D
63
64
memcpy_3D_async
65
66
67
68
69
memset BIT 4 2 6 17 17 17 1 0
70
Opcode select cache A address A value A value A value reserved
71
memset shape cache size of weight cache's
address (pointer)
| 0 ~ 64 | 0x0040 |
size of weight RAM's size of weight RAM's size of weight RAM's
72
73
weight cache 6 17 17 17 1 0
74
A address A value A value A value reserved
75
shape cache's
address (pointer)
| 0 ~ 64 | 0x0040 |
size of weight RAM's size of weight RAM's size of weight RAM's
76
77
memset
78
memset memset
79
80
mem_all_clear shape/weight RAM ----------------------------------------------------------------------------------------------
81
82
memset_weight weight RAM 0 ~ 0x0040 0 ~ 0x0040 0 ~ 0x0040 0 ~ 0x0040
83
84
memset_shape weight RAM