CNN and RVV Co-design for Efficient Model Serving
Paper in proceeding, 2025

Convolutional algorithm performance depends on layer dimensions, with SIMD demands and cache sharing influencing runtime selection. To identify the best settings, we perform a co-design exploration of convolutional layer parameters and three algorithms: Direct, im2col+GEMM, and Winograd, jointly with hardware parameters for RISC-V vector architectures. Our results show that incorporating hardware parameters with layer dimensions boosts execution time and efficiency, emphasizing the need for co-design.

Author

Sonia Rani Gupta

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Nikela Papadopoulou

University of Glasgow

Jing Chen

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Miquel Pericas

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Debs 2025 Proceedings of the 19th ACM International Conference on Distributed and Event Based Systems

243-244
9798400713323 (ISBN)

19th ACM International Conference on Distributed and Event-Based Systems, DEBS 2025
Gothenburg, Sweden,

P4PIM: Principles of power-constrained HPC programming for PIM networks

Swedish Research Council (VR) (2020-04892), 2021-01-01 -- 2024-12-31.

Subject Categories (SSIF 2025)

Computer Sciences

Computer Systems

DOI

10.1145/3701717.3733226

More information

Latest update

8/29/2025