No description
  • Verilog 51.3%
  • Python 23.1%
  • VHDL 14.2%
  • SystemVerilog 11.4%
Find a file
2026-02-23 14:28:08 +01:00
hex_file Working logic of the RPAS full algorithm 2025-06-23 00:28:26 +02:00
python updating the python analysis code 2026-02-17 17:07:35 +01:00
rtl updating the hdpe logic 2026-01-12 13:50:05 +01:00
tb updating the hdpe logic 2026-01-12 13:50:05 +01:00
uvm uvm readme 2025-12-14 19:49:08 +01:00
README.md updating the readme 2026-02-23 14:28:08 +01:00

AYAKA_Transformer

check: hdpe_unit_new.v

Input Matrix A: 5 rows x 4 columns Weight Matrix B: 4 rows x 3 columns Output Matrix C: 5 rows x 3 columns

Output Stationary

In Output Stationary, the output values remain in place during computation. Inputs and weights move through the compute array. Matrix shift-in pattern:

                        
                        0     b14   b23   b32   b41                                 N(i/p)
                        b04   b13   b22   b31   b40                                 |
                        b03   b12   b21   b30    0                          (i/p)W--0--E (o/p)
                        b02   b11   b20    0     0                                  |
                        b01   b10    0     0     0                                  S(o/p)
                        b00    0     0     0     0
0    a03  a02  a01  a00  pe00  pe01  pe02  pe03  pe04 -- c00  c01  c02   c03
a13  a12  a11  a10  0    pe10  pe11  pe12  pe13  pe14 -- 0    c10  c11   c12
a22  a21  a20  0    0    pe20  pe21  pe22  pe23  pe24 -- 0    0    c20   c21
a31  a30  0    0    0    pe30  pe31  pe32  pe33  pe34 -- 0    0    0     c30 
                          |     |      |     |     |
                          c00   0      0     0     0
                          c10   c01    0     0     0
                          c20   c11    c02   0     0
                          c30   c21    c12   c03   0
                          0     c31    c22   c13   c04  

Input Stationary

In Input Stationary, input matrix values remain fixed in local memory. Weights slide over, and outputs accumulate dynamically in PEs. Matrix shift-in pattern:

                    
    					 0	    b14	    b23	    b32
	    				b04	    b13	    b22	    b31
		    			b03	    b12	    b21	    b30
			    		b02	    b11	    b20	     0
				    	b01	    b10      0	     0
					    b00 	 0	     0	     0
	a00	a01	a02	a03:	a00->pp	a01->pp	a02->pp	a03-->c00 c01  c02  c03  c04
	a10	a11	a12	a13:	a10	    a11	    a12	    a13--> 0  c10  c11  c12  c13
	a20	a21	a22	a23:	a20	    a21	    a22	    a23--> 0   0   c20  c21  c22
	a30	a31	a32	a33:	a30	    a31	    a32	    a33--> 0   0    0   c30  c31
[pre load]

Weight Stationary

In Weight Stationary, weights are fixed in the compute elements. Inputs stream in, and partial sums propagate through the array to form the output. Matrix shift-in pattern:

                                        [pre load]
                                     
                            b00     b01     b02     b03     b04
                            b10     b11     b12     b13     b14
                            b20     b21     b22     b23     b24
                            b30     b31     b32     b33     b34
                            ...     ...     ...     ...     ...
0   0   0   a30 a20	a10 a00 b00	    b01	    b02	    b03	    b04
						    pp
0	0	a31	a21	a11	a01	0	b10	    b11	    b12	    b13	    b14
						    pp
0	a32	a22	a12	a02	0	0   b20	    b21	    b22	    b23	    b24
						    pp
a33	a23	a13	a03	0	0	0   b30	    b31	    b32	    b33	    b34
						     |	     |	     |	     |	     |
						     V	     V	     V	     V	     V
						    c00	     0	     0	     0	     0
						    c10	    C01	     0	     0	     0
						    c20	    c11	    c02	     0	     0
						    c30	    c21	    c12	    c03	     0
						     0	    c31	    c22	    c13     c04                                       

Memory 3 Layout

Memory 3 [20 X 100] (location: file_dump/mem3.hex)
===================

    |  20X4 |  20X4 |  20X4 |      20X10    |      20X10    |      20X10    |      20X10     |             20X20            |             20X20            |20X2 |20X1|20X1|20X2 |20x1|20X1|
 0--+-------+-------+-------+---------------+---------------+---------------+----------------+------------------------------+------------------------------+-----+----+----+-----+----+----+
    |       |       |       | h1   |  h2  |X|  h1  | h2   |X|   h1  |  h2  |X|  h1  |  h2  |X|                              |                              |     |    |    |     |    |    |
    |       |       |       | 20X4 | 20X4 |X| 20X4 | 20X4 |X|  20X4 | 20X4 |X| 20X4 | 20X4 |X|                              |                              |     |    |    |     |    |    |
    |       |       |       |      |       X|      |       X|       |       X|      |      |X|                              |                              |     |    |    |     |    |    |
    |       |       |       |      |      |X|      |      |X|       |      |X|      |      |X|                              |                              |MASK |MASK|MASK|MASK |MASK|MASK|
 20 |   T^  |   Wq^ |  Wkv^ |      Q^     |X|      W^     |X|    Q^_rpas   |X|    W^_rpas  |X|    A^=Q^_rpas X W^_rpas(h1)  |     A^=Q^_rpas X W^_rpas(h2) |  A  | Q  | KV |  A  | Q  | KV |
    |  rpas |  rpas |  rpas |      |       X|      |       X|       |       X|      |      |X|                              |                              |(h1) |(h1)|(h1)|(h2) |(h2)|(h2)|
    |       |       |       |      |      |X|      |      |X|       |      |X|      |      |X|                              |                              |     |    |    |     |    |    |
    |       |       |       |      |      |X|      |      |X|       |      |X|      |      |X|                              |                              |     |    |    |     |    |    |
    |       |       |       |12  15|16  19|X|22  25|26  29|X|32   35|36  39|X|42  45|46  49|X|                              |                              |     |    |    |     |    |    |
19--+-------+-------+-------+---------------+---------------+---------------+----------------+------------------------------+------------------------------+-----+----+----+-----+----+----+
    |0     3|4     7|8    11|12           21|22           31|32           41|42            51|52                          71|72                          91|92 93| 94 | 95 |96 97| 98 | 99 |

Reference

Y. Qin et al., "Ayaka: A Versatile Transformer Accelerator With Low-Rank Estimation and Heterogeneous Dataflow," IEEE Journal of Solid-State Circuits, vol. 59, no. 10, pp. 33423356, Oct. 2024. doi: 10.1109/JSSC.2024.3397189.