Characterizing and subsetting big data workloads

[Person 638461ae-0e95-4208-8bd8-9b9bc7160423 not found]; [Person 2d3c6f30-5c13-4ad6-bc5a-1cf1459a7cc3 not found]; [Person 22f8d812-d577-406a-879e-42e19175c13b not found]; [Person 21bd7562-c97d-4c7c-aac6-a775609337d8 not found]; [Person dfea1c65-c240-48f4-b876-0cd0edd3917e not found]; [Person fdb400e0-67f3-4515-a319-76dd0f9c464d not found]; [Person 2bb5b902-7caa-49ac-801f-5e308790810d not found]; [Person 36012fdd-1753-48e5-8d80-b027bbec1574 not found]

doi:10.1109/IISWC.2014.6983058

Characterizing and subsetting big data workloads
Paper i proceeding, 2014

© 2014 IEEE. Big data benchmark suites must include a diversity of data and workloads to be useful in fairly evaluating big data systems and architectures. However, using truly comprehensive benchmarks poses great challenges for the architecture community. First, we need to thoroughly understand the behaviors of a variety of workloads. Second, our usual simulation-based research methods become prohibitively expensive for big data. As big data is an emerging field, more and more software stacks are being proposed to facilitate the development of big data applications, which aggravates these challenges. In this paper, we first use Principle Component Analysis (PCA) to identify the most important characteristics from 45 metrics to characterize big data workloads from BigDataBench, a comprehensive big data benchmark suite. Second, we apply a clustering technique to the principle components obtained from the PCA to investigate the similarity among big data workloads, and we verify the importance of including different software stacks for big data benchmarking. Third, we select seven representative big data workloads by removing redundant ones and release the BigDataBench simulation version, which is publicly available from http://prof.ict.ac.cn/BigDataBench/ simulatorversion/.

Författare

[Person 638461ae-0e95-4208-8bd8-9b9bc7160423 not found]

[Person 2d3c6f30-5c13-4ad6-bc5a-1cf1459a7cc3 not found]

[Person 22f8d812-d577-406a-879e-42e19175c13b not found]

[Person 21bd7562-c97d-4c7c-aac6-a775609337d8 not found]

[Person dfea1c65-c240-48f4-b876-0cd0edd3917e not found]

[Person fdb400e0-67f3-4515-a319-76dd0f9c464d not found]

[Person 2bb5b902-7caa-49ac-801f-5e308790810d not found]

[Person 36012fdd-1753-48e5-8d80-b027bbec1574 not found]

IISWC 2014 - IEEE International Symposium on Workload Characterization

191-201
9781479964536 (ISBN)

Ämneskategorier (SSIF 2011)

Data- och informationsvetenskap

DOI

10.1109/IISWC.2014.6983058

Publikationsdata kopplat till DOI

ISBN

9781479964536

Mer information

Skapat

2017-10-07

Characterizing and subsetting big data workloads Paper i proceeding, 2014