Expand your knowledge of hardware, software and supercomputing

Troubleshooting OpenMPI Invocation Problems

OpenMPI works with a large number of transport mechanisms, from shared memory on the local machine, to IP over Ethernet or even RDMA over InfiniBand. With default settings, when you start your program using mpirun, OpenMPI will choose the best interface available.. Unfortunately, the logic isn’t foolproof, and sometimes you will hit snags and your job will appear to hang without even running your code.

The first step to troubleshooting OpenMPI invocation problems is to add the –debug-devel parameter, or -d. Unlike the –debug and –debugger arguments, which are used to invoke parallel debuggers to debug user code, the –debug-devel option will increase OpenMPI log verbosity.

Adding this to your mpirun command will generate a lot more feedback to tell you which transport medium it is using, addresses involved, and can give you hints on where to continue your investigation.

It’s important to remember that whichever transports are used with OpenMPI, IP will always be used for initial job tree setup. Even if you’re using InfiniBand, routing or MTU issues on your cluster’s IP network can prevent the MPI job from starting on each node.

Use our Breakin stress test and diagnostics tool to pinpoint hardware issues and component failures.
Check out our product catalog and use our Configurator to plan your next system and get a price estimate.

Request a Consultation from our team of HPC and AI Experts

Would you like to speak to one of our HPC or AI experts? We are here to help you. Submit your details, and we'll be in touch shortly.

"*" indicates required fields

Name * Required
This field is for validation purposes and should be left unchanged.