No description
hex_file | ||
python | ||
rtl | ||
README.md |
AYAKA_Transformer
check: hdpe_unit_new.v
Input Matrix A: 5 rows x 4 columns Weight Matrix B: 4 rows x 3 columns Output Matrix C: 5 rows x 3 columns
Output Stationary
In Output Stationary, the output values remain in place during computation. Inputs and weights move through the compute array. Matrix shift-in pattern:
(input column wise) 0 b14 b23 b32 b41 N(i/p) b04 b13 b22 b31 b40 | b03 b12 b21 b30 0 (i/p)W--0--E (o/p) b02 b11 b20 0 0 | b01 b10 0 0 0 S(o/p) b00 0 0 0 0 0 a03 a02 a01 a00 pe00 pe01 pe02 pe03 pe04 -- c00 c01 c02 c03 a13 a12 a11 a10 0 pe10 pe11 pe12 pe13 pe14 -- 0 c10 c11 c12 a22 a21 a20 0 0 pe20 pe21 pe22 pe23 pe24 -- 0 0 c20 c21 a31 a30 0 0 0 pe30 pe31 pe32 pe33 pe34 -- 0 0 0 c30 (output column wise) (input column wise) | | | | | c00 0 0 0 0 c10 c01 0 0 0 c20 c11 c02 0 0 c30 c21 c12 c03 0 0 c31 c22 c13 c04 (output row wise)
Input Stationary
In Input Stationary, input matrix values remain fixed in local memory. Weights slide over, and outputs accumulate dynamically in PEs. Matrix shift-in pattern:
(input column wise) 0 0 0 b32 0 0 b22 b31 0 b12 b21 b30 b02 b11 b20 0 b01 b10 0 0 b00 0 0 0 a00 a01 a02 a03: a00->pp a01->pp a02->pp a03-->c00 c01 c02 (output column wise) a10 a11 a12 a13: a10 a11 a12 a13-->c10 c11 c13 a20 a21 a22 a23: a20 a21 a22 a23-->c20 c21 c23 a30 a31 a32 a33: a30 a31 a32 a33-->c30 c31 c32 a40 a41 a42 a43: a40 a41 a42 a43-->c40 c41 c42 (input column wise) [pre load]
Weight Stationary
In Weight Stationary, weights are fixed in the compute elements. Inputs stream in, and partial sums propagate through the array to form the output. Matrix shift-in pattern:
[pre load] (input row wise) b00 b01 b02 b10 b11 b12 b20 b21 b22 b30 b31 b32 (input row wise) ... ... ... 0 0 0 a40 a30 a20 a10 a00 b00 b01 b02 pp 0 0 a41 a31 a21 a11 a01 0 b10 b11 b12 pp 0 a42 a32 a22 a12 a02 0 0 b20 b21 b22 pp a43 a33 a23 a13 a03 0 0 0 b30 b31 b32 | | | V V V c00 0 0 c10 C01 0 c20 c11 c02 c30 c21 c12 c40 c31 c22 c41 c32 c42 (output row wise)
Memory 3 Layout
Memory 3 [20 X 100] (location: file_dump/mem3.hex)
===================
| 20X4 | 20X4 | 20X4 | 20X10 | 20X10 | 20X10 | 20X10 | 20X20 | 20X20 |20X2 |20X1|20X1|20X2 |20x1|20X1|
0--+-------+-------+-------+---------------+---------------+---------------+----------------+------------------------------+------------------------------+-----+----+----+-----+----+----+
| | | | h1 | h2 |X| h1 | h2 |X| h1 | h2 |X| h1 | h2 |X| | | | | | | | |
| | | | 20X4 | 20X4 |X| 20X4 | 20X4 |X| 20X4 | 20X4 |X| 20X4 | 20X4 |X| | | | | | | | |
| | | | | X| | X| | X| | |X| | | | | | | | |
| | | | | |X| | |X| | |X| | |X| | |MASK |MASK|MASK|MASK |MASK|MASK|
20 | T^ | Wq^ | Wkv^ | Q^ |X| W^ |X| Q^_rpas |X| W^_rpas |X| A^=Q^_rpas X W^_rpas(h1) | A^=Q^_rpas X W^_rpas(h2) | A | Q | KV | A | Q | KV |
| rpas | rpas | rpas | | X| | X| | X| | |X| | |(h1) |(h1)|(h1)|(h2) |(h2)|(h2)|
| | | | | |X| | |X| | |X| | |X| | | | | | | | |
| | | | | |X| | |X| | |X| | |X| | | | | | | | |
| | | |12 15|16 19|X|22 25|26 29|X|32 35|36 39|X|42 45|46 49|X| | | | | | | | |
19--+-------+-------+-------+---------------+---------------+---------------+----------------+------------------------------+------------------------------+-----+----+----+-----+----+----+
|0 3|4 7|8 11|12 21|22 31|32 41|42 51|52 71|72 91|92 93| 94 | 95 |96 97| 98 | 99 |