basic_dtpc_tb.vhd | ||
ddot_unit.vhd | ||
ddot_unit_testBench.vhd | ||
dptc.vhd | ||
README.md | ||
real_vector_pkg.vhd | ||
tb_tile_array.vhd | ||
tile.vhd | ||
tile_array.vhd | ||
tile_array_verify.py | ||
tile_unit_tb.vhd |
Lightening_Transformer - Photonic AI Accelelerator
Implementation of the photonic AI accelerator – Lightening Transformer for accelerating general matrix multiplication for Fully dynamic and full range matrix inputs
Prerequisites
- VHDL
- Visual Studio code
- ModelSim
Setup
- Install ModelSim on your system.
- Any code editor or a text editor to edit the VHDL code is sufficient
Compilation
Use ModelSim to compile the files
Output simple test
Testbenches are also written using the text editor
Output log files
Simulate the output using graphical user interface offered by the modelsim and the graphs are displayed there
Project directory structure
Lightening_Transformer
- ddot_unit.vhd
- ddot_unit.vhd
- dtpc.vhd
- basic_dtpc_tb.vhd
- README.md
The dynamic full range matrix computation in Lightening Transformer (LT)
- The basic computation unit in LT is DDOT - Dynamically- Operated Full - range Dot - Product Unit
- The DDOT is capable of computing the Dot product between two vectors
- When two dynamic full-range matrices A & B are considered, the two vector operators of DDOT is the row vector of A and column vector of B
- Multiple DDOT units are arranged in a cross-connect array to get the DPTC
- DPTC - Dynamically-Operated Photonic Tensor Core - is the Core unit of photonic accelerator
- The DPTC can compute a small full matrix multiplication at once with the internal DDOT units
- Several DPTC units are clustered onto a single tile
- The DPTC units on a tile are operated in parallel, and results (partial sums) are aggregated locally in analog domain itself
- The photonic accelarator has multiple Tile units Nt, each with Nd number of DPTC cores
- Each Tile unit operates on different slice of Matrix A
- Every Tile unit operates on the same slice of Matrix B
- The results of Nt tile units are aggregated to get the final output
Architectural level optimization
-
Broadcasting operand by optical interconnect
- The matrix B (M2) is shared across tile units
- Slices of M2 are modulated globally and broadcast to every DPTC core on every tile in each cycle
-
Temporal partial sum aggregation in Analog-domain
- The results of Nd DPTC cores on each tile units are aggregated in Analog domain first by time integral and then converted into digital domain
Tiling and spatial/temporal mapping algorithm
-
Transpose B to obtain BT[Nq][Nm] (so we treat B columns as row vectors for dot products)
-
Partition A into Nt row-blocks: Each Tile_t gets Np_per_tile = Np / Nt rows of A
-
For each cycle (processing Nh columns of B at a time): a. Broadcast BT_slice of size Nh x Nm (Nh rows of BT = Nh columns of B)
-
On each Tile: a. Split A_slice into Nd column-blocks per row: Each DPTC_d gets Nm_per_core = Nm / Nd columns from A_slice
b. Split BT_slice across DPTCs (each gets corresponding Nm_per_core cols): Each DPTC_d receives a submatrix of size Nh x Nm_per_core
c. In each DPTC: For each row i in A_slice: For each row j in BT_slice: Compute DDOT(A_row_chunk[i], BT_row_chunk[j]) using vector width N Accumulate result in C[i_global][j_global]
-
Repeat for all Nh-wide slices of B
-
Output the result matrix C[Np][Nq]
Lightening Transformer Hardware implementation
-
Modulation & WDM
- The input operands are modulated into full-range values with MZM (Mach-Zehnder Modulator)
- The phase encoding by Eout = Ein cosφ ensures full range encoding
- The different wavelengths into which the operands are modulated are multiplexed
-
The DDOT unit is the basic computing unit
- It has the directional coupler, phase shifter, and optical detectors to perform the operations in analog optical domain
-
The DPTC is the photonic Core
- The photonic core is formed by optical cross-bar style array of DDOT units
Lightening Transformer architecture and its operation
- Lightening Transformer Overview:
- General Matrix Multiplication between dynamic, Full-range matrices
- Utilizes broadcast opportunity in optical domain, analog temporal accumulation
- Key Components: DDOT:
- DDot is Dynamically-Operated Full-range Dot-Product Unit.
DPTC:
- DPTC - Dynamically-Operated Photonic Tensor Core
Photonic accelerator
- Computation Mechanism
- Operates in optical domain leveraging operand sharing to accelerate the computaion