Difference between revisions of "MPI calculations with Cerberus"
(→Running calculations) |
(→Running calculations) |
||
Line 13: | Line 13: | ||
The MPI libraries used to build both the solver modules and the mpi4py package for Cerberus need to be loaded. | The MPI libraries used to build both the solver modules and the mpi4py package for Cerberus need to be loaded. | ||
− | Cerberus can be run with | + | Any Solvers that have '''-mpi''' or '''--mpi''' as their command line argument will be spawned with <tt>MPI.Comm.Spawn</tt> instead of being initiated as a subprocess. |
+ | |||
+ | Cerberus can then be run with a setup similar to the one described below | ||
<nowiki>#!/bin/bash | <nowiki>#!/bin/bash | ||
Line 23: | Line 25: | ||
module load mpi/openmpi-x86_64 | module load mpi/openmpi-x86_64 | ||
− | mpirun --report-bindings -np 4 --bind-to none | + | mpirun --report-bindings -np 4 --bind-to none -oversubscribe python input.py > cerberus_output.txt</nowiki> |
+ | |||
+ | The idea here is the following: | ||
+ | *Cerberus itself does not use any resources while a Solver is running, so we can <tt>-oversubscribe</tt> and let the MPI parallelized solver also use the resources allocated to Cerberus. | ||
+ | *To successfully use <tt>-oversubscribe</tt> the binding of the tasks to sockets or cores needs to be disabled with <tt>--bind-to none</tt>. | ||
+ | *Only one task of Cerberus communicates with only one Solver task. | ||
+ | **All but one Cerberus task exit upon loading the Cerberus package at <tt>cerberus.__init__.py</tt>. | ||
+ | **It can be somewhat tricky to get the remaining Cerberus task and the communicating Solver task (task 0 with Serpent and SuperFINIX) on the same node. | ||
+ | **<tt>--report-bindings</tt> with OpenMPI helps the user to analyze, which node each task is spawned on. | ||
+ | **The rank of the task that continues at <tt>cerberus.__init__.py</tt> can be adjusted if needed. | ||
+ | **In the future, a separate <tt>--cerberus-host <hostname></tt> argument may be added to Solvers to allow them to connect to Cerberus across nodes. | ||
+ | ***Socket communication across nodes is significantly slower than on the same node. |
Revision as of 09:21, 2 June 2022
Cerberus can be used to run certain solvers MPI parallelized across multiple computing nodes.
Required installation
Cerberus uses the mpi4py Python-package for MPI functionality. It can be installed with pip.
Notice, however that mpi4py will get compiled with whichever MPI-libraries you have available during the pip installation, which means that it is best to run the pip installation in an environment, where the correct MPI-libraries have been loaded.
Running calculations
The MPI libraries used to build both the solver modules and the mpi4py package for Cerberus need to be loaded.
Any Solvers that have -mpi or --mpi as their command line argument will be spawned with MPI.Comm.Spawn instead of being initiated as a subprocess.
Cerberus can then be run with a setup similar to the one described below
#!/bin/bash #SBATCH --cpus-per-task=20 #SBATCH --ntasks=4 #SBATCH --partition=core40 #SBATCH -o output.txt module load mpi/openmpi-x86_64 mpirun --report-bindings -np 4 --bind-to none -oversubscribe python input.py > cerberus_output.txt
The idea here is the following:
- Cerberus itself does not use any resources while a Solver is running, so we can -oversubscribe and let the MPI parallelized solver also use the resources allocated to Cerberus.
- To successfully use -oversubscribe the binding of the tasks to sockets or cores needs to be disabled with --bind-to none.
- Only one task of Cerberus communicates with only one Solver task.
- All but one Cerberus task exit upon loading the Cerberus package at cerberus.__init__.py.
- It can be somewhat tricky to get the remaining Cerberus task and the communicating Solver task (task 0 with Serpent and SuperFINIX) on the same node.
- --report-bindings with OpenMPI helps the user to analyze, which node each task is spawned on.
- The rank of the task that continues at cerberus.__init__.py can be adjusted if needed.
- In the future, a separate --cerberus-host <hostname> argument may be added to Solvers to allow them to connect to Cerberus across nodes.
- Socket communication across nodes is significantly slower than on the same node.