Difference between revisions of "MPI calculations with Cerberus"

From Kraken Wiki
Jump to: navigation, search
(Running calculations)
(Required installation)
Line 5: Line 5:
 
== Required installation ==
 
== Required installation ==
  
Cerberus uses the mpi4py Python-package for MPI functionality. It can be installed with pip.  
+
Cerberus uses the mpi4py Python-package for MPI functionality. It can be installed with pip. Note, mpi4py 3.1 onwards sometimes fails to install with ''No module named 'glob'''. You can try mpi4py 3.0.3 in that case.
  
 
Notice, however that mpi4py will get compiled with whichever MPI-libraries you have available during the pip installation, which means that the pip installation needs to be run in an environment, where the correct MPI-libraries have been loaded.
 
Notice, however that mpi4py will get compiled with whichever MPI-libraries you have available during the pip installation, which means that the pip installation needs to be run in an environment, where the correct MPI-libraries have been loaded.

Revision as of 11:31, 24 February 2023


Cerberus can be used to run certain solvers MPI parallelized across multiple computing nodes.

Required installation

Cerberus uses the mpi4py Python-package for MPI functionality. It can be installed with pip. Note, mpi4py 3.1 onwards sometimes fails to install with No module named 'glob'. You can try mpi4py 3.0.3 in that case.

Notice, however that mpi4py will get compiled with whichever MPI-libraries you have available during the pip installation, which means that the pip installation needs to be run in an environment, where the correct MPI-libraries have been loaded.

Running calculations

The MPI libraries used to build both the solver modules and the mpi4py package for Cerberus need to be loaded.

Solvers that are to use MPI parallelization, should be initialized with

Solver.initialize(use_MPI=True)

Cerberus can then be run (on SLURM) with a setup similar to the one described below

#!/bin/bash
#SBATCH --cpus-per-task=20
#SBATCH --ntasks=4
#SBATCH --partition=core40
#SBATCH -o output.txt

module load mpi/openmpi-x86_64

mpirun --report-bindings -np 4 --bind-to none -oversubscribe python input.py > cerberus_output.txt

The idea here is the following:

  • Cerberus itself does not use any resources while a Solver is running, so we can -oversubscribe and let the MPI parallelized solver also use the resources allocated to Cerberus.
  • To successfully use -oversubscribe the binding of the tasks to sockets or cores needs to be disabled with --bind-to none.
  • Only one task of Cerberus communicates with only one Solver task.
    • All but one Cerberus task exit upon loading the Cerberus package at cerberus.__init__.py.
    • By default the remaining Cerberus task is the one with the highest task number.
    • It can be somewhat tricky to get the remaining Cerberus task and the communicating Solver task (task 0 with Serpent and SuperFINIX) on the same node.
    • --report-bindings with OpenMPI helps the user to analyze, which node each task is spawned on.
    • The rank of the task that continues at cerberus.__init__.py can be adjusted if needed.
    • In the future, a separate --cerberus-hostname <hostname> argument may be added to Solvers to allow them to connect to Cerberus across nodes.
      • Socket communication across nodes is significantly slower than on the same node.