Scalable Matrix Multiplication with Hybrid CMOS-RSFQ Digital Signal Processor
Paper in proceedings, 2007
We report an RSFQ Digital Signal Processor design
based on hybrid RSFQ-CMOS memory suitable for a general
matrix-on-matrix multiplication algorithm. The design consists of an RSFQ Multiply-Accumulate Unit, memory caches and synchronization block, partitioned into multiple chips, and a large CMOS memory. The complexity of the RSFQ DSP is 10x10 multiplication, rounding to 14 bits, 18-bits accumulator and 4.4 Kb memory cache. The maximum simulated clock frequency is equal to 24 GHz for HYPRES 4.5 kA/cm2 process and optimum communication bandwidth with CMOS memory is 2 Gbps. The simplified version of the RSFQ DSP consisting of 4x4 MAC with rounding to 5 bits and 17x6 memory caches has been designed for HYPRES 4.5 kA/cm2 process and fabricated.