Separable 2D Convolution with Polymorphic Register Files
Paper in proceedings, 2013

This paper studies the performance of separable 2D convolution on multi-lane Polymorphic Register Files (PRFs). We present a matrix transposition algorithm optimized for PRFs, and a 2D vectorized convolution algorithm which avoids strided memory accesses. We compare the throughput of our PRF to the NVIDIA Tesla C2050 GPU. The results show that even in bandwidth constrained systems, multi-lane PRFs can outperform the GPU for 9 × 9 or larger mask sizes.


Catalin Ciobanu

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Georgi Gaydadjiev

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

03029743 (ISSN) 16113349 (eISSN)

Vol. 7767 317-328
978-3-642-36423-5 (ISBN)

Subject Categories

Computer Engineering

Embedded Systems

Computer Systems

Areas of Advance

Information and Communication Technology





More information