Understanding parsec performance on contemporary CMPS
Paper in proceeding, 2009
PARSEC is a reference application suite used in industry and academia to assess new Chip Multiprocessor (CMP) designs. No investigation to date has profiled PARSEC on real hardware to better understand scaling properties and bottlenecks. This understanding is crucial in guiding future CMP designs for these kinds of emerging workloads. We use hardware performance counters, taking a systems-level approach and varying common architectural parameters: number of out-of-order cores, memory hierarchy configu- rations, number of multiple simultaneous threads, number of memory channels, and processor frequencies. We find these programs to be largely compute-bound, and thus lim- ited by number of cores, micro-architectural resources, and cache-to-cache transfers, rather than by off-chip memory or system bus bandwidth. Half the suite fails to scale lin- early with increasing number of threads, and some applica- tions saturate performance at few threads on all platforms tested. Exploiting thread level parallelism delivers greater payoffs than exploiting instruction level parallelism. To re- duce power and improve performance, we recommend in- creasing the number of arithmetic units per core, increasing support for TLP, and reducing support for ILP.