Compiler-enhanced incremental checkpointing for openMP applications
Paper in proceedings, 2009

As modern supercomputing systems reach the peta-flop performance range, they grow in both size and complexity. This makes them increasingly vulnerable to failures from a variety of causes. Checkpointing is a popular technique for tolerating such failures, enablingapplications to periodically save their state and restart computation after a failure. Although a many automated system-level checkpointing solutions are currently availableto HPC users, manual application-level checkpointing remains more popular due to its superior performance. This paper improves performance of automated checkpointing via a compiler analysis for incremental checkpointing.This analysis, which works with both sequential and OpenMP applications, reduces checkpoint sizes by as much as 80% and enables asynchronous checkpointing.

Author

Greg Bronevetsky

Lawrence Livermore National Laboratory

Daniel Marques

Keshav Pingali

The University of Texas at Austin

Sally A McKee

Chalmers, Computer Science and Engineering (Chalmers)

Radu Rugina

23rd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2009; Rome; Italy; 23 May 2009 through 29 May 2009

5160999

Subject Categories

Computer and Information Science

DOI

10.1109/IPDPS.2009.5160999

ISBN

978-142443750-4