AYAKA_Transformer/README.md
2025-07-06 13:00:59 +02:00

106 lines
5.9 KiB
Markdown

# AYAKA_Transformer
check: hdpe_unit_new.v
Input Matrix A: 5 rows x 4 columns
Weight Matrix B: 4 rows x 3 columns
Output Matrix C: 5 rows x 3 columns
## Output Stationary
In Output Stationary, the output values remain in place during computation. Inputs and weights move through the compute array.
Matrix shift-in pattern:
<pre>
(input column wise)
0 b14 b23 b32 b41 N(i/p)
b04 b13 b22 b31 b40 |
b03 b12 b21 b30 0 (i/p)W--0--E (o/p)
b02 b11 b20 0 0 |
b01 b10 0 0 0 S(o/p)
b00 0 0 0 0
0 a03 a02 a01 a00 pe00 pe01 pe02 pe03 pe04 -- c00 c01 c02 c03
a13 a12 a11 a10 0 pe10 pe11 pe12 pe13 pe14 -- 0 c10 c11 c12
a22 a21 a20 0 0 pe20 pe21 pe22 pe23 pe24 -- 0 0 c20 c21
a31 a30 0 0 0 pe30 pe31 pe32 pe33 pe34 -- 0 0 0 c30 (output column wise)
(input column wise) | | | | |
c00 0 0 0 0
c10 c01 0 0 0
c20 c11 c02 0 0
c30 c21 c12 c03 0
0 c31 c22 c13 c04 (output row wise)
</pre>
## Input Stationary
In Input Stationary, input matrix values remain fixed in local memory. Weights slide over, and outputs accumulate dynamically in PEs.
Matrix shift-in pattern:
<pre>
(input column wise)
0 0 0 b32
0 0 b22 b31
0 b12 b21 b30
b02 b11 b20 0
b01 b10 0 0
b00 0 0 0
a00 a01 a02 a03: a00->pp a01->pp a02->pp a03-->c00 c01 c02 (output column wise)
a10 a11 a12 a13: a10 a11 a12 a13-->c10 c11 c13
a20 a21 a22 a23: a20 a21 a22 a23-->c20 c21 c23
a30 a31 a32 a33: a30 a31 a32 a33-->c30 c31 c32
a40 a41 a42 a43: a40 a41 a42 a43-->c40 c41 c42
(input column wise)
[pre load]
</pre>
## Weight Stationary
In Weight Stationary, weights are fixed in the compute elements. Inputs stream in, and partial sums propagate through the array to form the output.
Matrix shift-in pattern:
<pre>
[pre load]
(input row wise)
b00 b01 b02
b10 b11 b12
b20 b21 b22
b30 b31 b32
(input row wise) ... ... ...
0 0 0 a40 a30 a20 a10 a00 b00 b01 b02
pp
0 0 a41 a31 a21 a11 a01 0 b10 b11 b12
pp
0 a42 a32 a22 a12 a02 0 0 b20 b21 b22
pp
a43 a33 a23 a13 a03 0 0 0 b30 b31 b32
| | |
V V V
c00 0 0
c10 C01 0
c20 c11 c02
c30 c21 c12
c40 c31 c22
c41 c32
c42
(output row wise)
</pre>
## Memory 3 Layout
```plaintext
Memory 3 [20 X 100] (location: file_dump/mem3.hex)
===================
| 20X4 | 20X4 | 20X4 | 20X10 | 20X10 | 20X10 | 20X10 | 20X20 | 20X20 |20X2 |20X1|20X1|20X2 |20x1|20X1|
0--+-------+-------+-------+---------------+---------------+---------------+----------------+------------------------------+------------------------------+-----+----+----+-----+----+----+
| | | | h1 | h2 |X| h1 | h2 |X| h1 | h2 |X| h1 | h2 |X| | | | | | | | |
| | | | 20X4 | 20X4 |X| 20X4 | 20X4 |X| 20X4 | 20X4 |X| 20X4 | 20X4 |X| | | | | | | | |
| | | | | X| | X| | X| | |X| | | | | | | | |
| | | | | |X| | |X| | |X| | |X| | |MASK |MASK|MASK|MASK |MASK|MASK|
20 | T^ | Wq^ | Wkv^ | Q^ |X| W^ |X| Q^_rpas |X| W^_rpas |X| A^=Q^_rpas X W^_rpas(h1) | A^=Q^_rpas X W^_rpas(h2) | A | Q | KV | A | Q | KV |
| rpas | rpas | rpas | | X| | X| | X| | |X| | |(h1) |(h1)|(h1)|(h2) |(h2)|(h2)|
| | | | | |X| | |X| | |X| | |X| | | | | | | | |
| | | | | |X| | |X| | |X| | |X| | | | | | | | |
| | | |12 15|16 19|X|22 25|26 29|X|32 35|36 39|X|42 45|46 49|X| | | | | | | | |
19--+-------+-------+-------+---------------+---------------+---------------+----------------+------------------------------+------------------------------+-----+----+----+-----+----+----+
|0 3|4 7|8 11|12 21|22 31|32 41|42 51|52 71|72 91|92 93| 94 | 95 |96 97| 98 | 99 |