Co-Design of Convolutional Algorithms and Long Vector RISC-V Processors for Efficient CNN Model Serving

Sonia Rani Gupta; Nikela Papadopoulou; Jing Chen; Miquel Pericas

doi:10.1145/3673038.3673121

Co-Design of Convolutional Algorithms and Long Vector RISC-V Processors for Efficient CNN Model Serving
Paper in proceeding, 2024

The performance of convolutional algorithm depends on the size, stride, and input/output channels of the convolutional kernel. Moreover, the varying computational demands of convolutional layers influence the requirement for SIMD support on multicore processors. Finally, sharing cache resources in scenarios such as inference serving also impacts the runtime choice of the best algorithm. To identify the best settings, we perform a co-design exploration, focusing on the software parameters of the convolutional layers of convolutional neural networks (CNNs), and three distinct algorithmic implementations: Direct, im2col+GEMM, and Winograd, jointly with hardware parameters for vector architectures. Our simulation-based study identifies that Winograd is suitable for convolutional layers with a 3 × 3 kernel size and stride 1, specifically for shorter vector lengths and L2 cache sizes. For layers with more input/output channels, im2col+GEMM performs better. Looking at VGG-16, our study shows that not all the layers benefit from our biggest simulated cache memory when using the Direct and Winograd implementations, while the im2col+GEMM implementation scales to an L2 cache memory of 64MB with all layers. In contrast, all the simulated layers of YOLOv3 benefit from an L2 cache memory of 64MB, for all convolutional algorithms. To select the best implementation at runtime, we develop a random forest predictor that selects the best algorithm in over 90% of the cases, with limited degradation when a sub-optimal configuration is selected. We conclude with a Pareto analysis of the area-performance trade-off in an inference serving scenario, on a 7nm RISC-V multicore model with a vector unit supporting vectors of 512 up to 4096 bits.

Author

Sonia Rani Gupta

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Other publications Research

Nikela Papadopoulou

University of Glasgow

Other publications Research

Jing Chen

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Other publications Research

Miquel Pericas

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Other publications Research

ACM International Conference Proceeding Series

73-83
9798400708428 (ISBN)

53rd International Conference on Parallel Processing, ICPP 2024
Gotland, Sweden,

Subject Categories (SSIF 2011)

Computer Science

Computer Systems

DOI

10.1145/3673038.3673121

Publication data connected to DOI

More information

Latest update

3/7/2025 1

Co-Design of Convolutional Algorithms and Long Vector RISC-V Processors for Efficient CNN Model Serving Paper in proceeding, 2024

Author

Sonia Rani Gupta

Nikela Papadopoulou

Jing Chen

Miquel Pericas

ACM International Conference Proceeding Series

Subject Categories (SSIF 2011)

DOI

More information

Latest update

Co-Design of Convolutional Algorithms and Long Vector RISC-V Processors for Efficient CNN Model Serving
Paper in proceeding, 2024