Implementation of the photonic AI accelerator – Lightening Transformer for accelerating general matrix multiplication for Fully dynamic and full range matrix inputs
Find a file
2025-06-18 13:05:14 +02:00
basic_dtpc_tb.vhd Tile array splitting matrices to tile units 2025-05-22 14:49:19 +02:00
ddot_unit.vhd Changes to generalise the data types 2025-06-18 12:45:53 +02:00
ddot_unit_testBench.vhd automatic verification of matrix multiplication 2025-06-09 19:11:01 +02:00
dptc.vhd Tile unit integrating DPTCs and spliting matrice slices 2025-05-19 20:23:29 +02:00
README.md added details 2025-06-04 12:18:41 +02:00
real_vector_pkg.vhd Changes to generalise the data types 2025-06-18 12:45:53 +02:00
tb_tile_array.vhd automatic verification of matrix multiplication 2025-06-09 19:11:01 +02:00
tile.vhd Changes to generalise the data type 2025-06-18 13:05:14 +02:00
tile_array.vhd automatic verification of matrix multiplication 2025-06-09 19:11:01 +02:00
tile_array_verify.py automatic verification of matrix multiplication 2025-06-09 19:11:01 +02:00
tile_unit_tb.vhd Tile array splitting matrices to tile units 2025-05-22 14:49:19 +02:00

Lightening_Transformer - Photonic AI Accelelerator

Implementation of the photonic AI accelerator Lightening Transformer for accelerating general matrix multiplication for Fully dynamic and full range matrix inputs

Prerequisites

  • VHDL
  • Visual Studio code
  • ModelSim

Setup

  1. Install ModelSim on your system.
  2. Any code editor or a text editor to edit the VHDL code is sufficient

Compilation

Use ModelSim to compile the files

Output simple test

Testbenches are also written using the text editor

Output log files

Simulate the output using graphical user interface offered by the modelsim and the graphs are displayed there

Project directory structure

Lightening_Transformer

  • ddot_unit.vhd
  • ddot_unit.vhd
  • dtpc.vhd
  • basic_dtpc_tb.vhd
  • README.md

The dynamic full range matrix computation in Lightening Transformer (LT)

  • The basic computation unit in LT is DDOT - Dynamically- Operated Full - range Dot - Product Unit
  • The DDOT is capable of computing the Dot product between two vectors
  • When two dynamic full-range matrices A & B are considered, the two vector operators of DDOT is the row vector of A and column vector of B
  • Multiple DDOT units are arranged in a cross-connect array to get the DPTC
  • DPTC - Dynamically-Operated Photonic Tensor Core - is the Core unit of photonic accelerator
  • The DPTC can compute a small full matrix multiplication at once with the internal DDOT units
  • Several DPTC units are clustered onto a single tile
  • The DPTC units on a tile are operated in parallel, and results (partial sums) are aggregated locally in analog domain itself
  • The photonic accelarator has multiple Tile units Nt, each with Nd number of DPTC cores
  • Each Tile unit operates on different slice of Matrix A
  • Every Tile unit operates on the same slice of Matrix B
  • The results of Nt tile units are aggregated to get the final output

Architectural level optimization

  1. Broadcasting operand by optical interconnect

    • The matrix B (M2) is shared across tile units
    • Slices of M2 are modulated globally and broadcast to every DPTC core on every tile in each cycle
  2. Temporal partial sum aggregation in Analog-domain

    • The results of Nd DPTC cores on each tile units are aggregated in Analog domain first by time integral and then converted into digital domain

Tiling and spatial/temporal mapping algorithm

  1. Transpose B to obtain BT[Nq][Nm] (so we treat B columns as row vectors for dot products)

  2. Partition A into Nt row-blocks: Each Tile_t gets Np_per_tile = Np / Nt rows of A

  3. For each cycle (processing Nh columns of B at a time): a. Broadcast BT_slice of size Nh x Nm (Nh rows of BT = Nh columns of B)

  4. On each Tile: a. Split A_slice into Nd column-blocks per row: Each DPTC_d gets Nm_per_core = Nm / Nd columns from A_slice

    b. Split BT_slice across DPTCs (each gets corresponding Nm_per_core cols): Each DPTC_d receives a submatrix of size Nh x Nm_per_core

    c. In each DPTC: For each row i in A_slice: For each row j in BT_slice: Compute DDOT(A_row_chunk[i], BT_row_chunk[j]) using vector width N Accumulate result in C[i_global][j_global]

  5. Repeat for all Nh-wide slices of B

  6. Output the result matrix C[Np][Nq]

Lightening Transformer Hardware implementation

  1. Modulation & WDM

    • The input operands are modulated into full-range values with MZM (Mach-Zehnder Modulator)
    • The phase encoding by Eout = Ein cosφ ensures full range encoding
    • The different wavelengths into which the operands are modulated are multiplexed
  2. The DDOT unit is the basic computing unit

    • It has the directional coupler, phase shifter, and optical detectors to perform the operations in analog optical domain
  3. The DPTC is the photonic Core

    • The photonic core is formed by optical cross-bar style array of DDOT units

Lightening Transformer architecture and its operation

  1. Lightening Transformer Overview:
  • General Matrix Multiplication between dynamic, Full-range matrices
  • Utilizes broadcast opportunity in optical domain, analog temporal accumulation
  1. Key Components: DDOT:
  • DDot is Dynamically-Operated Full-range Dot-Product Unit.

DPTC:

  • DPTC - Dynamically-Operated Photonic Tensor Core

Photonic accelerator

  1. Computation Mechanism
  • Operates in optical domain leveraging operand sharing to accelerate the computaion