Multi-LSTM Acceleration and CNN Fault Tolerance
Licentiate thesis, 2021

This thesis addresses the following two problems related to the field of Machine Learning: the acceleration of multiple Long Short Term Memory (LSTM) models on FPGAs and the fault tolerance of compressed Convolutional Neural Networks (CNN). LSTMs represent an effective solution to capture long-term dependencies in sequential data, like sentences in Natural Language Processing applications, video frames in Scene Labeling tasks or temporal series in Time Series Forecasting. In order to further boost their efficacy, especially in presence of long sequences, multiple LSTM models are utilized in a Hierarchical and Stacked fashion. However, because of their memory-bounded nature, efficient mapping of multiple LSTMs on a computing device becomes even more challenging. The first part of this thesis addresses the problem of mapping multiple LSTM models to a FPGA device by introducing a framework that modifies their memory requirements according to the target architecture. For the similar accuracy loss, the proposed framework maps multiple LSTMs with a performance improvement of 3x to 5x over state-of-the-art approaches. In the second part of this thesis, we investigate the fault tolerance of CNNs, another effective deep learning architecture. CNNs represent a dominating solution in image classification tasks, but suffer from a high performance cost, due to their computational structure. In fact, due to their large parameter space, fetching their data from main memory typically becomes a performance bottleneck. In order to tackle the problem, various techniques for their parameters compression have been developed, such as weight pruning, weight clustering and weight quantization. However, reducing the memory footprint of an application can lead to its data becoming more sensitive to faults. For this thesis work, we have conducted an analysis to verify the conditions for applying OddECC, a mechanism that supports variable strength and size ECCs for different memory regions. Our experiments reveal that compressed CNNs, which have their memory footprint reduced up to 86.3x by utilizing the aforementioned compression schemes, exhibit accuracy drops up to 13.56% in presence of random single bit faults.

Compression

SVD

LSTMs

CNNs

Fault Tolerance

Machine Learning

FPGA

Roofline Model

HLS

Caffe

EC, EDIT-Building
Opponent: Theocharis Theocharides, University of Cyprus, Cyprus

Author

Stefano Ribes

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Ribes S., Trancoso P., Sourdis I. and Bouganis C.-S., Mapping Multiple LSTM models on FPGAs, Int’l Conf. on Field-Programmable Technology (FPT), December, 2020

Ribes S. Malek A., Trancoso P., Sourdis I., Reliability Analysis of Compressed CNNs

Energy-efficient Heterogeneous COmputing at exaSCALE (ECOSCALE)

European Commission (EC) (EC/H2020/671632), 2015-10-01 -- 2018-12-31.

Meeting Challenges in Computer Architecture (MECCA)

European Commission (EC) (EC/FP7/340328), 2014-02-01 -- 2019-01-31.

Secure Hardware-Software Architectures for Robust Computing Systems (SHARCS)

European Commission (EC) (EC/H2020/644571), 2015-01-01 -- 2018-12-31.

Subject Categories

Computer Engineering

Embedded Systems

Computer Science

Computer Systems

Areas of Advance

Information and Communication Technology

Publisher

Chalmers

EC, EDIT-Building

Online

Opponent: Theocharis Theocharides, University of Cyprus, Cyprus

More information

Latest update

12/9/2021