Bundling runs to improve throughput on Mira
1.Poster Title | Bundling runs to improve throughput on Mira |
|---|---|
2.Authors | @Qi Tang, Ray Loy, @Peter Caldwell, @Steve Klein, @David C. Bader, @Mark Taylor, @Patrick Worley (Unlicensed), @Marcia Branstetter (Unlicensed), @Kate Evans (Unlicensed), @Yun Qian, and @Hui Wan |
3.Group | Atmosphere, SE, and performance |
4.Experiment | |
5.Poster Category | Problem/Solution |
6.Submission Type | poster |
7.Poster Link |
|
Abstract
One of the most important tasks for the ACME project is to perform big production simulations. In order to compute these runs within a reasonable time frame, it is important to take full advantage of the DOE leadership computing resource. A large allocation on the Mira machine at Argonne Leadership Computing Facility (ALCF) is assigned to the ACME project but even at global 0.25o resolution ACME simulations are too small to meet the minimum processor count for priority on Mira, resulting in low throughput. We solved this problem by bundling many (4 to 32) small jobs as a large one to exceed the threshold for the higher priority queue. This method has been successfully implemented for perturbed parameter simulations using the regional refined model and for atmosphere-only ("AMIP") climate simulations at 0.25o resolution. It will also be used for upcoming fully coupled ensemble simulations.