Separable 2D Convolution with Polymorphic Register Files
Paper i proceeding, 2013

This paper studies the performance of separable 2D convolution on multi-lane Polymorphic Register Files (PRFs). We present a matrix transposition algorithm optimized for PRFs, and a 2D vectorized convolution algorithm which avoids strided memory accesses. We compare the throughput of our PRF to the NVIDIA Tesla C2050 GPU. The results show that even in bandwidth constrained systems, multi-lane PRFs can outperform the GPU for 9 × 9 or larger mask sizes.

Författare

Catalin Ciobanu

Chalmers, Data- och informationsteknik, Datorteknik

Georgi Gaydadjiev

Chalmers, Data- och informationsteknik, Datorteknik

Lecture Notes in Computer Science

0302-9743 (ISSN)

Vol. 7767 317-328

Ämneskategorier

Datorteknik

Inbäddad systemteknik

Datorsystem

Styrkeområden

Informations- och kommunikationsteknik

DOI

10.1007/978-3-642-36424-2_27

ISBN

978-3-642-36423-5

Mer information

Skapat

2017-10-08