Overview

HyperQueue is a tool designed to simplify execution of large workflows on HPC clusters. It allows you to execute a large number of tasks in a simple way, without having to manually submit jobs into batch schedulers like PBS or Slurm. You just specify what you want to compute – HyperQueue will automatically ask for computational resources and dynamically load-balance tasks across all allocated nodes and cores.
v0.1
Useful links#
Features#
- Automatic management of batch jobs
- HQ automatically asks for computing resources
-
Computation is distributed amongst all a
-
Performance
- The inner scheduler can scale to hundreds of nodes
- The overhead for one task is below 0.1ms.
-
HQ allows to stream outputs from tasks to avoid creating many small files on a distributed filesystem
-
Easy deployment
- HQ is provided as a single, statically linked binary without any dependencies
- No admin access to a cluster is needed to use HQ
Architecture#
HyperQueue has two runtime components:
- Server: a long-lived component which can run e.g. on a login node of a computing cluster. It handles task submitted by the user, manages and asks for HPC resources (PBS/Slurm jobs) and distributes tasks to available workers.
- Worker: runs on a computing node and actually executes submitted tasks.

You can find more information about the architecture of HyperQueue here.
Last update: October 28, 2021
Created: May 1, 2021
Created: May 1, 2021