Difference between revisions of "MPI calculations with Cerberus"
(→Running calculations) |
(→Running calculations) |
||
Line 13: | Line 13: | ||
The MPI libraries used to build both the solver modules and the mpi4py package for Cerberus need to be loaded. | The MPI libraries used to build both the solver modules and the mpi4py package for Cerberus need to be loaded. | ||
− | Any Solvers that have '''-mpi''' or '''--mpi''' as their command line argument will be spawned with <tt>MPI.Comm.Spawn</tt> instead of being initiated as a subprocess. | + | Any Solvers that have '''<tt>-mpi</tt>''' or '''<tt>--mpi</tt>''' as their command line argument will be spawned with <tt>MPI.Comm.Spawn</tt> instead of being initiated as a subprocess. |
Cerberus can then be run with a setup similar to the one described below | Cerberus can then be run with a setup similar to the one described below | ||
Line 28: | Line 28: | ||
The idea here is the following: | The idea here is the following: | ||
− | *Cerberus itself does not use any resources while a Solver is running, so we can <tt>-oversubscribe</tt> and let the MPI parallelized solver also use the resources allocated to Cerberus. | + | *Cerberus itself does not use any resources while a Solver is running, so we can '''<tt>-oversubscribe</tt>''' and let the MPI parallelized solver also use the resources allocated to Cerberus. |
− | *To successfully use <tt>-oversubscribe</tt> the binding of the tasks to sockets or cores needs to be disabled with <tt>--bind-to none</tt>. | + | *To successfully use '''<tt>-oversubscribe</tt>''' the binding of the tasks to sockets or cores needs to be disabled with '''<tt>--bind-to none</tt>'''. |
*Only one task of Cerberus communicates with only one Solver task. | *Only one task of Cerberus communicates with only one Solver task. | ||
− | **All but one Cerberus task exit upon loading the Cerberus package at <tt>cerberus.__init__.py</tt>. | + | **All but one Cerberus task exit upon loading the Cerberus package at '''<tt>cerberus.__init__.py</tt>'''. |
**It can be somewhat tricky to get the remaining Cerberus task and the communicating Solver task (task 0 with Serpent and SuperFINIX) on the same node. | **It can be somewhat tricky to get the remaining Cerberus task and the communicating Solver task (task 0 with Serpent and SuperFINIX) on the same node. | ||
− | **<tt>--report-bindings</tt> with OpenMPI helps the user to analyze, which node each task is spawned on. | + | **'''<tt>--report-bindings</tt>''' with OpenMPI helps the user to analyze, which node each task is spawned on. |
− | **The rank of the task that continues at <tt>cerberus.__init__.py</tt> can be adjusted if needed. | + | **The rank of the task that continues at '''<tt>cerberus.__init__.py</tt>''' can be adjusted if needed. |
− | **In the future, a separate <tt>--cerberus- | + | **In the future, a separate '''<tt>--cerberus-hostname <hostname></tt>''' argument may be added to Solvers to allow them to connect to Cerberus across nodes. |
***Socket communication across nodes is significantly slower than on the same node. | ***Socket communication across nodes is significantly slower than on the same node. |
Revision as of 09:22, 2 June 2022
Cerberus can be used to run certain solvers MPI parallelized across multiple computing nodes.
Required installation
Cerberus uses the mpi4py Python-package for MPI functionality. It can be installed with pip.
Notice, however that mpi4py will get compiled with whichever MPI-libraries you have available during the pip installation, which means that it is best to run the pip installation in an environment, where the correct MPI-libraries have been loaded.
Running calculations
The MPI libraries used to build both the solver modules and the mpi4py package for Cerberus need to be loaded.
Any Solvers that have -mpi or --mpi as their command line argument will be spawned with MPI.Comm.Spawn instead of being initiated as a subprocess.
Cerberus can then be run with a setup similar to the one described below
#!/bin/bash #SBATCH --cpus-per-task=20 #SBATCH --ntasks=4 #SBATCH --partition=core40 #SBATCH -o output.txt module load mpi/openmpi-x86_64 mpirun --report-bindings -np 4 --bind-to none -oversubscribe python input.py > cerberus_output.txt
The idea here is the following:
- Cerberus itself does not use any resources while a Solver is running, so we can -oversubscribe and let the MPI parallelized solver also use the resources allocated to Cerberus.
- To successfully use -oversubscribe the binding of the tasks to sockets or cores needs to be disabled with --bind-to none.
- Only one task of Cerberus communicates with only one Solver task.
- All but one Cerberus task exit upon loading the Cerberus package at cerberus.__init__.py.
- It can be somewhat tricky to get the remaining Cerberus task and the communicating Solver task (task 0 with Serpent and SuperFINIX) on the same node.
- --report-bindings with OpenMPI helps the user to analyze, which node each task is spawned on.
- The rank of the task that continues at cerberus.__init__.py can be adjusted if needed.
- In the future, a separate --cerberus-hostname <hostname> argument may be added to Solvers to allow them to connect to Cerberus across nodes.
- Socket communication across nodes is significantly slower than on the same node.