Difference between revisions of "MPI calculations with Cerberus"

From Kraken Wiki
Jump to: navigation, search
(Running calculations)
(Running calculations)
 
(9 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
== Required installation ==
 
== Required installation ==
  
Cerberus uses the mpi4py Python-package for MPI functionality. It can be installed with pip.  
+
Cerberus uses the mpi4py Python-package for MPI functionality. It can be installed with pip. Note, mpi4py 3.1 onwards sometimes fails to install with ''No module named 'glob'''. You can try mpi4py 3.0.3 in that case.
  
Notice, however that mpi4py will get compiled with whichever MPI-libraries you have available during the pip installation, which means that it is best to run the pip installation in an environment, where the correct MPI-libraries have been loaded.
+
Notice, however that mpi4py will get compiled with whichever MPI-libraries you have available during the pip installation, which means that the pip installation needs to be run in an environment, where the correct MPI-libraries have been loaded.
 +
 
 +
If you need to re-install mpi4py for whatever reason, you may want to use ''--no-cache-dir'' with pip in order to not just get the same version again.
  
 
== Running calculations ==
 
== Running calculations ==
Line 13: Line 15:
 
The MPI libraries used to build both the solver modules and the mpi4py package for Cerberus need to be loaded.
 
The MPI libraries used to build both the solver modules and the mpi4py package for Cerberus need to be loaded.
  
Cerberus can be run with  
+
Solvers that are to use MPI parallelization, should be initialized with e.g.
 +
 
 +
Solver.initialize(use_MPI=True, n_MPI_procs_to_spawn=4)
 +
 
 +
Cerberus can then be run (on SLURM) with a setup similar to the one described below
  
 
  <nowiki>#!/bin/bash
 
  <nowiki>#!/bin/bash
Line 23: Line 29:
 
module load mpi/openmpi-x86_64
 
module load mpi/openmpi-x86_64
  
mpirun --report-bindings -np 4 --bind-to none --map-by ppr:2:node:pe=20 -oversubscribe python input.py > ulos.txt</nowiki>
+
mpirun --report-bindings -np 1 --bind-to none -oversubscribe python input.py > cerberus_output.txt</nowiki>
 +
 
 +
The idea here is the following:
 +
*Cerberus itself does not use any resources while a Solver is running, so we can '''<tt>-oversubscribe</tt>''' and let the MPI parallelized solver also use the resources allocated to Cerberus.
 +
*To successfully use '''<tt>-oversubscribe</tt>''' the binding of the tasks to sockets or cores needs to be disabled with '''<tt>--bind-to none</tt>'''.
 +
*Only one Cerberus task is started (-np 1).
 +
*In the future, a separate '''<tt>--host <hostname></tt>''' argument will be added to Solvers to allow them to connect to Cerberus across nodes.
 +
**Socket communication across nodes is significantly slower than on the same node.

Latest revision as of 09:37, 14 September 2023


Cerberus can be used to run certain solvers MPI parallelized across multiple computing nodes.

Required installation

Cerberus uses the mpi4py Python-package for MPI functionality. It can be installed with pip. Note, mpi4py 3.1 onwards sometimes fails to install with No module named 'glob'. You can try mpi4py 3.0.3 in that case.

Notice, however that mpi4py will get compiled with whichever MPI-libraries you have available during the pip installation, which means that the pip installation needs to be run in an environment, where the correct MPI-libraries have been loaded.

If you need to re-install mpi4py for whatever reason, you may want to use --no-cache-dir with pip in order to not just get the same version again.

Running calculations

The MPI libraries used to build both the solver modules and the mpi4py package for Cerberus need to be loaded.

Solvers that are to use MPI parallelization, should be initialized with e.g.

Solver.initialize(use_MPI=True, n_MPI_procs_to_spawn=4)

Cerberus can then be run (on SLURM) with a setup similar to the one described below

#!/bin/bash
#SBATCH --cpus-per-task=20
#SBATCH --ntasks=4
#SBATCH --partition=core40
#SBATCH -o output.txt

module load mpi/openmpi-x86_64

mpirun --report-bindings -np 1 --bind-to none -oversubscribe python input.py > cerberus_output.txt

The idea here is the following:

  • Cerberus itself does not use any resources while a Solver is running, so we can -oversubscribe and let the MPI parallelized solver also use the resources allocated to Cerberus.
  • To successfully use -oversubscribe the binding of the tasks to sockets or cores needs to be disabled with --bind-to none.
  • Only one Cerberus task is started (-np 1).
  • In the future, a separate --host <hostname> argument will be added to Solvers to allow them to connect to Cerberus across nodes.
    • Socket communication across nodes is significantly slower than on the same node.