Designing OS for HPC applications: Scheduling
Paper i proceeding, 2010

Operating systems have historically been implemented as independent layers between hardware and applications. User programs communicate with the OS through a set of well defined system calls, and do not have direct access to the hardware. The OS, in turn, communicates with the underlying architecture via control registers. Except for these interfaces, the three layers are practically oblivious to each other. While this structure improves portability and transparency, it may not deliver optimal performance. This is especially true for High Performance Computing (HPC) systems, where modern parallel applications and multi-core architectures pose new challenges in terms of performance, power consumption, and system utilization. The hardware, the OS, and the applications can no longer remain isolated, and instead should cooperate to deliver high performance with minimal power consumption. In this paper we present our experience with the design and implementation of High Performance Linux (HPL), an operating system designed to optimize the performance of HPC applications running on a state-of-the-art compute cluster. We show how characterizing parallel applications through hardware and software performance counters drives the design of the OS and how including knowledge about the architecture improves performance and efficiency. We perform experiments on a dual-socket IBM POWER6 machine, showing performance improvements and stability (performance variation of 2.11% on average) for NAS, a widely used parallel benchmark suite.

Författare

Roberto Gioiosa

Centro Nacional de Supercomputacion

Sally A McKee

Chalmers, Data- och informationsteknik, Datorteknik

M. Valero

Universitat Politecnica de Catalunya

Proceedings - IEEE International Conference on Cluster Computing, ICCC

1552-5244 (ISSN)

78-87

Styrkeområden

Informations- och kommunikationsteknik

Ämneskategorier

Data- och informationsvetenskap

DOI

10.1109/CLUSTER.2010.16

ISBN

978-076954220-1

Mer information

Senast uppdaterat

2018-03-29