Using MPI
MPI (Message-Passing Interface) allows multiple computers (nodes) to communicate with each other by sharing blocks of memory with each other. Traditionally, it's used with C and Fortran - here I'll talk about how to get it running in Visual Studio using C/C++ on our clusters.
Using Visual Studio for MPI
Prerequesites
- You'll need Visual Studio obviously.
- You'll also need the Microsoft MPI SDK, which you download from https://www.microsoft.com/en-us/download/details.aspx?id=49926 . It installs some stuff into C:\Program Files (x86)\Microsoft SDKs\MPI.
Setup the Project
- Create a new C/C++ Empty project. Add a new helloworld.cpp file to it.
- Project Properties, C/C++, General, Additional Include Directories. Add C:\Program Files (x86)\Microsoft SDKs\MPI\Include
- Project Properties, Linker, General, Additional Library Directories. Add C:\Program Files (x86)\Microsoft SDKs\MPI\Lib\x64 (for 64-bit - x86 if you really want 32-bit).
- Project Properties, Linker, Input, Additional Dependencies: insert msmpi.lib;
- Project Properties, C/C++, Code Generation, set Runtime Library to a non-DLL version (/MT)
Write MPI Helllo World
#include <mpi.h> #include <stdio.h> int main(int argc, char* argv[]) { int mpi_size,mpi_rank; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &mpi_size); MPI_Comm_rank(MPI_COMM_WORLD, &mpi_rank); printf("Hello World %d out of %d\n",mpi_rank,mpi_size); MPI_Finalize(); return 0; }
Launching the Job
Let's assume you've built the above do a test in your home directory - \\fi--san02\homes\user\mpitest, and we'll write a script called run.bat, which will take two arguments: (1) the number of nodes you want to use, and (2) the working directory.
mpiexec -n %1 -wdir %2 helloworld.exe
And then we'll write another text file launch.bat to set the job running on the cluster:
set NODES=2 set WORKDIR=\\fi--san02\homes\user\mpitest job submit /scheduler:fi--didemrchnb /jobtemplate:GeneralNodes /numnodes:%NODES% /singlenode:false /workdir:%WORKDIR% \\fi--san02\homes\user\mpitest /stdout:out.txt run.bat %NODES% %WORKDIR%
A bit more MPI
Just a taster - you can easily google for all the MPI examples in the world. But just so you know what you're in for... It's quite low level. You can do things like:
- Scatter data from one node to all the others
- Gather data back to one node from all the others.
- Scatterv and Gatherv allow you to scatter or gather non-equally-sized portions of data, but you have to know how big each bit is in advance. So, commonly, you might do a pair of MPI operations, the first to tell everyone the sizes (one integer per node), and the second to deal with the variable-size data, since you now know how big it is.
- Allgather and Allgatherv cause all of the nodes to end up with all of the data, rather than just one node accumulating it all.
- And the data we have been speaking of is... an array of ints, or floats of various sizes.
A very simple example. Arrange it so that all the MPI nodes know the names of all the MPI nodes. First: get the name.
char name[MPI_MAX_PROCESSOR_NAME]; int len; MPI_Get_processor_name(name, &len);
Now, the name could be a different length, so first, get all the nodes to tell everyone else how large their name is.
int* results = new int[mpi_size]; MPI_Allgather(&len, 1, MPI_INT, results, 1, MPI_INT, MPI_COMM_WORLD); /* MPI_Allgather's arguments are: &len = Address of data to send 1 = How many items to send MPI_INT = The data type to send. results = Address to receive results into 1 = How many items *per node* to receive MPI_INT = The data type to receive MPI_COMM_WORLD = A reference to the universe. */
So after this, all the nodes know the length of all the nodes' names. They could differ, so...
int totalsize = 0; for (int i = 0; i < mpi_size; i++) totalsize += sizes[i]; char* incoming = new char[totalsize]; int* displs = new int[mpi_size]; displs[0] = 0; for (int i = 1; i < mpi_size; i++) displs[i] = displs[i - 1] + sizes[i - 1];
Here, we've worked out the total incoming buffer size, and made memory space for it. We know the sizes of each one, and I've calculated the displacements for each one - ie, displs[n] is the place in my receive buffer where the data coming from node n will begin.
MPI_Allgatherv(&name, len, MPI_CHAR, incoming, sizes, displs, MPI_CHAR, MPI_COMM_WORLD); /* MPI_Allgatherv's arguments are: &name = Address of data to send len = How many items to send MPI_CHAR = The data type to send. incoming = Address to receive data into sizes = Array of size mpi_size - size to receive from each node. displs = Array of size mpi_size - starting point of data for each node. MPI_CHAR = The data type to receive. MPI_COMM_WORLD = A reference to the universe. */
And if you want to get the names out one by one, then, perhaps something like this...
std::string allresults(incoming); std::string* array_results = new std::string[mpi_size]; for (int i = 0; i < mpi_size; i++) { array_results[i] = allresults.substr(displs[i], sizes[i]); }
Don't forget to
MPI_Finalize();
at the end.