Multi-LSTM Acceleration and CNN Fault Tolerance
Licentiatavhandling, 2021

This thesis addresses the following two problems related to the field of Machine Learning: the acceleration of multiple Long Short Term Memory (LSTM) models on FPGAs and the fault tolerance of compressed Convolutional Neural Networks (CNN). LSTMs represent an effective solution to capture long-term dependencies in sequential data, like sentences in Natural Language Processing applications, video frames in Scene Labeling tasks or temporal series in Time Series Forecasting. In order to further boost their efficacy, especially in presence of long sequences, multiple LSTM models are utilized in a Hierarchical and Stacked fashion. However, because of their memory-bounded nature, efficient mapping of multiple LSTMs on a computing device becomes even more challenging. The first part of this thesis addresses the problem of mapping multiple LSTM models to a FPGA device by introducing a framework that modifies their memory requirements according to the target architecture. For the similar accuracy loss, the proposed framework maps multiple LSTMs with a performance improvement of 3x to 5x over state-of-the-art approaches. In the second part of this thesis, we investigate the fault tolerance of CNNs, another effective deep learning architecture. CNNs represent a dominating solution in image classification tasks, but suffer from a high performance cost, due to their computational structure. In fact, due to their large parameter space, fetching their data from main memory typically becomes a performance bottleneck. In order to tackle the problem, various techniques for their parameters compression have been developed, such as weight pruning, weight clustering and weight quantization. However, reducing the memory footprint of an application can lead to its data becoming more sensitive to faults. For this thesis work, we have conducted an analysis to verify the conditions for applying OddECC, a mechanism that supports variable strength and size ECCs for different memory regions. Our experiments reveal that compressed CNNs, which have their memory footprint reduced up to 86.3x by utilizing the aforementioned compression schemes, exhibit accuracy drops up to 13.56% in presence of random single bit faults.

Machine Learning

Fault Tolerance

LSTMs

Caffe

CNNs

SVD

Roofline Model

Compression

HLS

FPGA

EC, EDIT-Building
Opponent: Theocharis Theocharides, University of Cyprus, Cyprus

Författare

Stefano Ribes

Chalmers, Data- och informationsteknik, Datorteknik, Computer Systems

Ribes S., Trancoso P., Sourdis I. and Bouganis C.-S., Mapping Multiple LSTM models on FPGAs, Int’l Conf. on Field-Programmable Technology (FPT), December, 2020

Ribes S. Malek A., Trancoso P., Sourdis I., Reliability Analysis of Compressed CNNs

Secure Hardware-Software Architectures for Robust Computing Systems (SHARCS)

Europeiska kommissionen (EU), 2015-01-01 -- 2018-12-31.

Energy-efficient Heterogeneous COmputing at exaSCALE (ECOSCALE)

Europeiska kommissionen (EU), 2015-10-01 -- 2018-12-31.

Meeting Challenges in Computer Architecture (MECCA)

Europeiska kommissionen (EU), 2014-02-01 -- 2019-01-31.

Ämneskategorier

Datorteknik

Inbäddad systemteknik

Datavetenskap (datalogi)

Datorsystem

Styrkeområden

Informations- och kommunikationsteknik

Utgivare

Chalmers tekniska högskola

EC, EDIT-Building

Online

Opponent: Theocharis Theocharides, University of Cyprus, Cyprus

Mer information

Senast uppdaterat

2021-04-09