# Lightening_Transformer - Photonic AI Accelelerator Implementation of the photonic AI accelerator – Lightening Transformer for accelerating general matrix multiplication for Fully dynamic and full range matrix inputs ## Prerequisites - VHDL - Visual Studio code - ModelSim ## Setup 1. Install ModelSim on your system. 2. Any code editor or a text editor to edit the VHDL code is sufficient ## Compilation Use ModelSim to compile the files ## Output simple test Testbenches are also written using the text editor ## Output log files Simulate the output using graphical user interface offered by the modelsim and the graphs are displayed there ## Project directory structure Lightening_Transformer - ddot_unit.vhd - ddot_unit.vhd - dtpc.vhd - basic_dtpc_tb.vhd - README.md ## The dynamic full range matrix computation in Lightening Transformer (LT) - The basic computation unit in LT is DDOT - Dynamically- Operated Full - range Dot - Product Unit - The DDOT is capable of computing the Dot product between two vectors - When two dynamic full-range matrices A & B are considered, the two vector operators of DDOT is the row vector of A and column vector of B - Multiple DDOT units are arranged in a cross-connect array to get the DPTC - DPTC - Dynamically-Operated Photonic Tensor Core - is the Core unit of photonic accelerator - The DPTC can compute a small full matrix multiplication at once with the internal DDOT units - Several DPTC units are clustered onto a single tile - The DPTC units on a tile are operated in parallel, and results (partial sums) are aggregated locally in analog domain itself - The photonic accelarator has multiple Tile units Nt, each with Nd number of DPTC cores - Each Tile unit operates on different slice of Matrix A - Every Tile unit operates on the same slice of Matrix B - The results of Nt tile units are aggregated to get the final output # Architectural level optimization 1. Broadcasting operand by optical interconnect - The matrix B (M2) is shared across tile units - Slices of M2 are modulated globally and broadcast to every DPTC core on every tile in each cycle 2. Temporal partial sum aggregation in Analog-domain - The results of Nd DPTC cores on each tile units are aggregated in Analog domain first by time integral and then converted into digital domain # Tiling and spatial/temporal mapping algorithm 1. Transpose B to obtain BT[Nq][Nm] (so we treat B columns as row vectors for dot products) 2. Partition A into Nt row-blocks: Each Tile_t gets Np_per_tile = Np / Nt rows of A 3. For each cycle (processing Nh columns of B at a time): a. Broadcast BT_slice of size Nh x Nm (Nh rows of BT = Nh columns of B) 4. On each Tile: a. Split A_slice into Nd column-blocks per row: Each DPTC_d gets Nm_per_core = Nm / Nd columns from A_slice b. Split BT_slice across DPTCs (each gets corresponding Nm_per_core cols): Each DPTC_d receives a submatrix of size Nh x Nm_per_core c. In each DPTC: For each row i in A_slice: For each row j in BT_slice: Compute DDOT(A_row_chunk[i], BT_row_chunk[j]) using vector width N Accumulate result in C[i_global][j_global] 5. Repeat for all Nh-wide slices of B 6. Output the result matrix C[Np][Nq] # Lightening Transformer Hardware implementation 1. Modulation & WDM - The input operands are modulated into full-range values with MZM (Mach-Zehnder Modulator) - The phase encoding by Eout = Ein cosφ ensures full range encoding - The different wavelengths into which the operands are modulated are multiplexed 2. The DDOT unit is the basic computing unit - It has the directional coupler, phase shifter, and optical detectors to perform the operations in analog optical domain 3. The DPTC is the photonic Core - The photonic core is formed by optical cross-bar style array of DDOT units # Lightening Transformer architecture and its operation 1. Lightening Transformer Overview: - General Matrix Multiplication between dynamic, Full-range matrices - Utilizes broadcast opportunity in optical domain, analog temporal accumulation 2. Key Components: DDOT: - DDot is Dynamically-Operated Full-range Dot-Product Unit. DPTC: - DPTC - Dynamically-Operated Photonic Tensor Core Photonic accelerator 3. Computation Mechanism - Operates in optical domain leveraging operand sharing to accelerate the computaion