MPI calculations with Cerberus
Cerberus can be used to run certain solvers MPI parallelized across multiple computing nodes.
Required installation
Cerberus uses the mpi4py Python-package for MPI functionality. It can be installed with pip.
Notice, however that mpi4py will get compiled with whichever MPI-libraries you have available during the pip installation, which means that it is best to run the pip installation in an environment, where the correct MPI-libraries have been loaded.
Running calculations
The MPI libraries used to build both the solver modules and the mpi4py package for Cerberus need to be loaded.
Any Solvers that have -mpi or --mpi as their command line argument will be spawned with MPI.Comm.Spawn instead of being initiated as a subprocess.
Cerberus can then be run with a setup similar to the one described below
#!/bin/bash #SBATCH --cpus-per-task=20 #SBATCH --ntasks=4 #SBATCH --partition=core40 #SBATCH -o output.txt module load mpi/openmpi-x86_64 mpirun --report-bindings -np 4 --bind-to none -oversubscribe python input.py > cerberus_output.txt
The idea here is the following:
- Cerberus itself does not use any resources while a Solver is running, so we can -oversubscribe and let the MPI parallelized solver also use the resources allocated to Cerberus.
- To successfully use -oversubscribe the binding of the tasks to sockets or cores needs to be disabled with --bind-to none.
- Only one task of Cerberus communicates with only one Solver task.
- All but one Cerberus task exit upon loading the Cerberus package at cerberus.__init__.py.
- It can be somewhat tricky to get the remaining Cerberus task and the communicating Solver task (task 0 with Serpent and SuperFINIX) on the same node.
- --report-bindings with OpenMPI helps the user to analyze, which node each task is spawned on.
- The rank of the task that continues at cerberus.__init__.py can be adjusted if needed.
- In the future, a separate --cerberus-hostname <hostname> argument may be added to Solvers to allow them to connect to Cerberus across nodes.
- Socket communication across nodes is significantly slower than on the same node.