Models and Methods for Development of DSP Applications on Manycore Processors
Doktorsavhandling, 2009
Advanced digital signal processing systems require specialized
high-performance embedded computer architectures. The term high-performance
translates to large amounts of data and computations per time unit. The term
embedded further implies requirements on physical size and power efficiency.
Thus the requirements are of both functional and non-functional nature. This
thesis addresses the development of high-performance digital signal
processing systems relying on manycore technology. We propose building
two-level hierarchical computer architectures for this domain of
applications. Further, we outline a tool flow based on methods and analysis
techniques for automated, multi-objective mapping of such applications on
distributed memory manycore processors. In particular, the focus is put on
how to provide a means for tunable strategies for mapping of task graphs on
array structured distributed memory manycores, with respect to given
application constraints. We argue for code mapping strategies based on
predicted execution performance, which can be used in an auto-tuning
feedback loop or to guide manual tuning directed by the programmer.
Automated parallelization, optimisation and mapping to a manycore processor
benefits from the use of a concurrent programming model as the starting
point. Such a model allows the programmer to express different types and
granularities of parallelism as well as computation characteristics of
importance in the addressed class of applications. The programming model
should also abstract away machine dependent hardware details. The analytical
study of WCDMA baseband processing in radio base stations, presented in this
thesis, suggests dataflow models as a good match to the characteristics of
the application and as execution model abstracting computations on a
manycore.
Construction of portable tools further requires a manycore machine model and
an intermediate representation. The models are needed in order to decouple
algorithms, used to transform and map application software, from hardware.
We propose a manycore machine model that captures common hardware resources,
as well as resource dependent performance metrics for parallel computation
and communication. Further, we have developed a multifunctional intermediate
representation, which can be used as source for code generation and for
dynamic execution analysis. Finally, we demonstrate how we can dynamically
analyse execution using abstract interpretation on the intermediate
representation. It is shown that the performance predictions can be used to
accurately rank different mappings by best throughput or shortest end-to-end
computation latency.
parallel code mapping
parallel processing
concurrent models of computation
dataflow
manycore processors
parallel machine model
dynamic performance analysis
high-performance digital signal processing
Wigforssalen, house Visionen, Halmstad University
Opponent: Professor Shuvra S. Bhattacharyya, Dept. of Electrical and Computer Engineering, University of Maryland, USA