HPC
High Performance Computing.
Contents
Provisioning
Bare Metal Provisioning system for HPC Linux clusters (Source Code) GPL-3.
xCAT is a toolkit for deployment and administration of clusters of all sizes (Source Code) EPL-1.0.
Warewulf is a stateless and diskless container operating system provisioning system for large clusters of bare metal and/or virtual systems (Source Code) BSD-3.
Cobbler is a Linux installation server that allows for rapid setup of network installation environments (Source Code) GPL-2.0.
BlueBanquise is an open source cluster deployment and management stack built on Python and Ansible (Source Code) MIT.
Workload Managers
A free and open source job scheduler (Source Code) OSS.
A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC Apache-2.0.
OpenPBS software optimizes job scheduling and workload management in high-performance computing (HPC) environments (Source Code) other.
Pipelines
Applications
Compilers
MPI
Benchmarking
HPCC Systems (High Performance Computing Cluster) is an open source, massive parallel-processing computing platform for big data processing and analytics other.
A distributed storage benchmark for files, objects & blocks with support for GPUs GPL-3.
Miscellaneous
Open OnDemand helps computational researchers and students efficiently utilize remote computing resources by making them easy to access from any device MIT.
Open XDMoD is an open source tool to facilitate the management of high performance computing resources LGPL-3.
ColdFront is an open source resource allocation system designed to provide a central portal for administration, reporting, and measuring scientific impact of HPC resources GPL-3.
Pavilion is a Python 3 (3.6+) based framework for running and analyzing tests targeting HPC systems other.
A powerful Python framework for writing and running portable regression tests and benchmarks for HPC systems. BSD-3.
The OLCF Test Harness (OTH) helps automate the testing of applications, tools, and other system software other.
Goslmailer is a drop-in notification delivery solution for slurm that can do slack, mattermost, teams, and more.
Parallel Shells
Containers
Apptainer is an open source container system (BSD).
Charliecloud provides user-defined software stacks (UDSS) for high-performance computing (HPC) centers (Apache-2.0).
A basic user tool to execute simple docker containers in batch or interactive systems without root privileges (Apache-2.0).
Shifter is Linux containers for HPC (other).
HPC Container Maker is an open source tool to make it easier to generate container specification files. Apache-2.0.
An OCI-compatible container engine for HPC BSD.
Singularity Registry HPC (shpc) allows you to install containers as modules (MPL 2.0).
Environment Management
Lmod: An Environment Module System based on Lua, Reads TCL Modules, Supports a Software Hierarchy (other).
Environment Modules: provides dynamic modification of a user's environment (GPL-2).
Mamba is a reimplementation of the conda package manager in C++ (BSD).
Visualization
Parallel Filesystems
Ceph is a distributed object, block, and file storage platform (other).
Lustre is an open-source, distributed parallel file system software platform designed for scalability, high-performance, and high-availability (other).
OrangeFS is a next generation parallel file system for Linux clusters (other).
Moose File System is an Open-source, POSIX-compliant distributed file system developed by Core Technology (GPL-2.0).
Monitoring
Prometheus Based
Prometheus exporter for performance metrics from Slurm GPL-3.0.
Slurm Exporter for Prometheus using Rest API GPL-3.0.
The InfiniBand exporter collects counters from InfiniBand switches and HCAs Apache-2.0.
Produces metrics from cgroups Apache-2.0.
A Prometheus exporter for cgroup-level metrics unknown.
The GPFS exporter collects metrics from the GPFS filesystem Apache-2.0.
Prometheus exporter for use with the Lustre parallel filesystem GPL-3.0.
NVIDIA GPU metrics exporter for Prometheus leveraging DCGM Apache-2.0.