106 lines
5.9 KiB
Markdown
106 lines
5.9 KiB
Markdown
# AYAKA_Transformer
|
|
check: hdpe_unit_new.v
|
|
|
|
Input Matrix A: 5 rows x 4 columns
|
|
Weight Matrix B: 4 rows x 3 columns
|
|
Output Matrix C: 5 rows x 3 columns
|
|
|
|
## Output Stationary
|
|
|
|
In Output Stationary, the output values remain in place during computation. Inputs and weights move through the compute array.
|
|
Matrix shift-in pattern:
|
|
|
|
<pre>
|
|
(input column wise)
|
|
0 b14 b23 b32 b41 N(i/p)
|
|
b04 b13 b22 b31 b40 |
|
|
b03 b12 b21 b30 0 (i/p)W--0--E (o/p)
|
|
b02 b11 b20 0 0 |
|
|
b01 b10 0 0 0 S(o/p)
|
|
b00 0 0 0 0
|
|
0 a03 a02 a01 a00 pe00 pe01 pe02 pe03 pe04 -- c00 c01 c02 c03
|
|
a13 a12 a11 a10 0 pe10 pe11 pe12 pe13 pe14 -- 0 c10 c11 c12
|
|
a22 a21 a20 0 0 pe20 pe21 pe22 pe23 pe24 -- 0 0 c20 c21
|
|
a31 a30 0 0 0 pe30 pe31 pe32 pe33 pe34 -- 0 0 0 c30 (output column wise)
|
|
(input column wise) | | | | |
|
|
c00 0 0 0 0
|
|
c10 c01 0 0 0
|
|
c20 c11 c02 0 0
|
|
c30 c21 c12 c03 0
|
|
0 c31 c22 c13 c04 (output row wise)
|
|
</pre>
|
|
|
|
## Input Stationary
|
|
|
|
In Input Stationary, input matrix values remain fixed in local memory. Weights slide over, and outputs accumulate dynamically in PEs.
|
|
Matrix shift-in pattern:
|
|
<pre>
|
|
(input column wise)
|
|
0 0 0 b32
|
|
0 0 b22 b31
|
|
0 b12 b21 b30
|
|
b02 b11 b20 0
|
|
b01 b10 0 0
|
|
b00 0 0 0
|
|
a00 a01 a02 a03: a00->pp a01->pp a02->pp a03-->c00 c01 c02 (output column wise)
|
|
a10 a11 a12 a13: a10 a11 a12 a13-->c10 c11 c13
|
|
a20 a21 a22 a23: a20 a21 a22 a23-->c20 c21 c23
|
|
a30 a31 a32 a33: a30 a31 a32 a33-->c30 c31 c32
|
|
a40 a41 a42 a43: a40 a41 a42 a43-->c40 c41 c42
|
|
(input column wise)
|
|
[pre load]
|
|
</pre>
|
|
|
|
## Weight Stationary
|
|
In Weight Stationary, weights are fixed in the compute elements. Inputs stream in, and partial sums propagate through the array to form the output.
|
|
Matrix shift-in pattern:
|
|
<pre>
|
|
[pre load]
|
|
(input row wise)
|
|
b00 b01 b02
|
|
b10 b11 b12
|
|
b20 b21 b22
|
|
b30 b31 b32
|
|
(input row wise) ... ... ...
|
|
0 0 0 a40 a30 a20 a10 a00 b00 b01 b02
|
|
pp
|
|
0 0 a41 a31 a21 a11 a01 0 b10 b11 b12
|
|
pp
|
|
0 a42 a32 a22 a12 a02 0 0 b20 b21 b22
|
|
pp
|
|
a43 a33 a23 a13 a03 0 0 0 b30 b31 b32
|
|
| | |
|
|
V V V
|
|
c00 0 0
|
|
c10 C01 0
|
|
c20 c11 c02
|
|
c30 c21 c12
|
|
c40 c31 c22
|
|
c41 c32
|
|
c42
|
|
|
|
(output row wise)
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
## Memory 3 Layout
|
|
|
|
```plaintext
|
|
Memory 3 [20 X 100] (location: file_dump/mem3.hex)
|
|
===================
|
|
|
|
| 20X4 | 20X4 | 20X4 | 20X10 | 20X10 | 20X10 | 20X10 | 20X20 | 20X20 |20X2 |20X1|20X1|20X2 |20x1|20X1|
|
|
0--+-------+-------+-------+---------------+---------------+---------------+----------------+------------------------------+------------------------------+-----+----+----+-----+----+----+
|
|
| | | | h1 | h2 |X| h1 | h2 |X| h1 | h2 |X| h1 | h2 |X| | | | | | | | |
|
|
| | | | 20X4 | 20X4 |X| 20X4 | 20X4 |X| 20X4 | 20X4 |X| 20X4 | 20X4 |X| | | | | | | | |
|
|
| | | | | X| | X| | X| | |X| | | | | | | | |
|
|
| | | | | |X| | |X| | |X| | |X| | |MASK |MASK|MASK|MASK |MASK|MASK|
|
|
20 | T^ | Wq^ | Wkv^ | Q^ |X| W^ |X| Q^_rpas |X| W^_rpas |X| A^=Q^_rpas X W^_rpas(h1) | A^=Q^_rpas X W^_rpas(h2) | A | Q | KV | A | Q | KV |
|
|
| rpas | rpas | rpas | | X| | X| | X| | |X| | |(h1) |(h1)|(h1)|(h2) |(h2)|(h2)|
|
|
| | | | | |X| | |X| | |X| | |X| | | | | | | | |
|
|
| | | | | |X| | |X| | |X| | |X| | | | | | | | |
|
|
| | | |12 15|16 19|X|22 25|26 29|X|32 35|36 39|X|42 45|46 49|X| | | | | | | | |
|
|
19--+-------+-------+-------+---------------+---------------+---------------+----------------+------------------------------+------------------------------+-----+----+----+-----+----+----+
|
|
|0 3|4 7|8 11|12 21|22 31|32 41|42 51|52 71|72 91|92 93| 94 | 95 |96 97| 98 | 99 |
|