Understanding Data Analytics Workloads on Intel(R) Xeon Phi(R)
Paper in proceedings, 2016
The Intel® Xeon Phi™ is gaining popularity for high-performance computing (HPC) applications, but the performance of this many-core coprocessor with wide floating point SIMD units has yet to be explored on data analytics workloads. We construct a benchmark suite to explore the Xeon Phi™'s potential for use in data center servers. Our resulting PhiBench consists of eight representative data analytics workloads covering six application domains. These workloads are optimized for Xeon Phi™ and implemented with openMP and Cilk Plus. We run them on real-world datasets and compare their performances for different programming models, input data sizes, and thread counts. Most benefit from the Xeon Phi™'s high computational capacity, delivering speedups by factors of four to almost 29. We further analyze their microarchitecture-level performance characteristics, including vectorization intensities and cache behaviors, and we investigate the impact of affinities and scheduling policies on performance and scalability. Our observations should help other researchers and practitioners to understand and optimize the behaviors of data analytics workloads on the Xeon Phi™.