Using MPI: Difference between revisions

From MRC Centre for Outbreak Analysis and Modelling
Jump to navigation Jump to search
(MPI Starter)
 
 
(4 intermediate revisions by the same user not shown)
Line 4: Line 4:
=== Prerequesites ===
=== Prerequesites ===
* You'll need Visual Studio obviously.  
* You'll need Visual Studio obviously.  
* You'll also need the Microsoft MPI SDK, which you download from https://www.microsoft.com/en-us/download/details.aspx?id=49926 . It installs some stuff into C:\Program Files (x86)\Microsoft SDKs\MPI.
* You'll also need the Microsoft MPI SDK, which you download from https://www.microsoft.com/en-us/download/details.aspx?id=56727. It installs some stuff into C:\Program Files (x86)\Microsoft SDKs\MPI.


=== Setup the Project ===
=== Setup the Project ===
Line 31: Line 31:
</pre>
</pre>


=== Launching the Job ===
== Launching the Job ==


Let's assume you've built the above do a test in your home directory - \\fi--san02\homes\user\mpitest, and we'll write a script called ''run.bat'', which will take two arguments: (1) the number of nodes you want to use, and (2) the working directory.
Let's assume you've built the above do a test in your home directory - \\fi--san02\homes\user\mpitest, and we'll write a script called ''run.bat'', which will take two arguments: (1) the number of nodes you want to use, and (2) the working directory.
Line 40: Line 40:
<pre>
<pre>
set NODES=2
set NODES=2
set WORKDIR=\\fi--san02\homes\user\mpitest
set WORKDIR=\\qdrive.dide.ic.ac.uk\homes\user\mpitest
job submit /scheduler:fi--didemrchnb /jobtemplate:GeneralNodes /numnodes:%NODES% /singlenode:false /workdir:%WORKDIR% \\fi--san02\homes\user\mpitest /stdout:out.txt run.bat %NODES% %WORKDIR%
job submit /scheduler:fi--didemrchnb /jobtemplate:GeneralNodes /numnodes:%NODES% /singlenode:false /workdir:%WORKDIR% \\fi--san02\homes\user\mpitest /stdout:out.txt run.bat %NODES% %WORKDIR%
</pre>
</pre>


=== A bit more MPI ===
== A bit more MPI ==


Just a taster - you can easily google for all the MPI examples in the world. But just so you know what you're in for... It's quite low level. You can do things like:
Just a taster - you can easily google for all the MPI examples in the world. But just so you know what you're in for... It's quite low level. You can do things like:
Line 58: Line 58:
   char name[MPI_MAX_PROCESSOR_NAME];
   char name[MPI_MAX_PROCESSOR_NAME];
   int len;
   int len;
   MPI_Get_processor_name(name, &len);   // This is my name.
   MPI_Get_processor_name(name, &len);
</pre>
</pre>


Line 65: Line 65:
<pre>
<pre>
   int* results = new int[mpi_size];
   int* results = new int[mpi_size];
   MPI_Allgather(&len, 1, MPI_INT, results, 1, MPI_INT, MPI_COMM_WORLD);  // (remember, 5th arg is per processor)
   MPI_Allgather(&len, 1, MPI_INT, results, 1, MPI_INT, MPI_COMM_WORLD);   


   /*  MPI_Allgather's arguments are:  
   /*  MPI_Allgather's arguments are:  
Line 78: Line 78:
</pre>
</pre>


  So after this, all the nodes know the length of all the nodes' names. They could differ, so...
So after this, all the nodes know the length of all the nodes' names. They could differ, so...


<pre>
<pre>
Line 89: Line 89:
</pre>
</pre>


  Here, we've worked out the total incoming buffer size, and made memory space for it. We know the sizes of each one, and I've calculated the displacements for each one - ie, displs[n] is the place in my receive buffer where the data coming from node n will begin.  
Here, we've worked out the total incoming buffer size, and made memory space for it. We know the sizes of each one, and I've calculated the displacements for each one - ie, displs[n] is the place in my receive buffer where the data coming from node n will begin.  


<pre>
<pre>
Line 103: Line 103:
         MPI_CHAR = The data type to receive.
         MPI_CHAR = The data type to receive.
         MPI_COMM_WORLD = A reference to the universe.    */
         MPI_COMM_WORLD = A reference to the universe.    */
</pre>
</pre>


  And if you want to get the names out one by one, then, perhaps something like this...
And if you want to get the names out one by one, then, perhaps something like this...


<pre>
<pre>

Latest revision as of 19:08, 9 July 2018

MPI (Message-Passing Interface) allows multiple computers (nodes) to communicate with each other by sharing blocks of memory with each other. Traditionally, it's used with C and Fortran - here I'll talk about how to get it running in Visual Studio using C/C++ on our clusters.

Using Visual Studio for MPI

Prerequesites

Setup the Project

  • Create a new C/C++ Empty project. Add a new helloworld.cpp file to it.
  • Project Properties, C/C++, General, Additional Include Directories. Add C:\Program Files (x86)\Microsoft SDKs\MPI\Include
  • Project Properties, Linker, General, Additional Library Directories. Add C:\Program Files (x86)\Microsoft SDKs\MPI\Lib\x64 (for 64-bit - x86 if you really want 32-bit).
  • Project Properties, Linker, Input, Additional Dependencies: insert msmpi.lib;
  • Project Properties, C/C++, Code Generation, set Runtime Library to a non-DLL version (/MT)

Write MPI Helllo World

#include <mpi.h>
#include <stdio.h>

int main(int argc, char* argv[]) {
  int mpi_size,mpi_rank;
  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &mpi_size);
  MPI_Comm_rank(MPI_COMM_WORLD, &mpi_rank);
  printf("Hello World %d out of %d\n",mpi_rank,mpi_size);
  MPI_Finalize();
  return 0;
}

Launching the Job

Let's assume you've built the above do a test in your home directory - \\fi--san02\homes\user\mpitest, and we'll write a script called run.bat, which will take two arguments: (1) the number of nodes you want to use, and (2) the working directory.

mpiexec -n %1 -wdir %2 helloworld.exe

And then we'll write another text file launch.bat to set the job running on the cluster:

set NODES=2
set WORKDIR=\\qdrive.dide.ic.ac.uk\homes\user\mpitest
job submit /scheduler:fi--didemrchnb /jobtemplate:GeneralNodes /numnodes:%NODES% /singlenode:false /workdir:%WORKDIR% \\fi--san02\homes\user\mpitest /stdout:out.txt run.bat %NODES% %WORKDIR%

A bit more MPI

Just a taster - you can easily google for all the MPI examples in the world. But just so you know what you're in for... It's quite low level. You can do things like:

  • Scatter data from one node to all the others
  • Gather data back to one node from all the others.
  • Scatterv and Gatherv allow you to scatter or gather non-equally-sized portions of data, but you have to know how big each bit is in advance. So, commonly, you might do a pair of MPI operations, the first to tell everyone the sizes (one integer per node), and the second to deal with the variable-size data, since you now know how big it is.
  • Allgather and Allgatherv cause all of the nodes to end up with all of the data, rather than just one node accumulating it all.
  • And the data we have been speaking of is... an array of ints, or floats of various sizes.

A very simple example. Arrange it so that all the MPI nodes know the names of all the MPI nodes. First: get the name.

  char name[MPI_MAX_PROCESSOR_NAME];
  int len;
  MPI_Get_processor_name(name, &len);

Now, the name could be a different length, so first, get all the nodes to tell everyone else how large their name is.

  int* results = new int[mpi_size];
  MPI_Allgather(&len, 1, MPI_INT, results, 1, MPI_INT, MPI_COMM_WORLD);  

  /*  MPI_Allgather's arguments are: 
        &len    = Address of data to send
        1       = How many items to send
        MPI_INT = The data type to send.
        results = Address to receive results into
        1       = How many items *per node* to receive
        MPI_INT = The data type to receive
        MPI_COMM_WORLD = A reference to the universe.     */
  

So after this, all the nodes know the length of all the nodes' names. They could differ, so...

  int totalsize = 0;
  for (int i = 0; i < mpi_size; i++) totalsize += sizes[i];
  char* incoming = new char[totalsize];
  int* displs = new int[mpi_size];
  displs[0] = 0;
  for (int i = 1; i < mpi_size; i++) displs[i] = displs[i - 1] + sizes[i - 1];

Here, we've worked out the total incoming buffer size, and made memory space for it. We know the sizes of each one, and I've calculated the displacements for each one - ie, displs[n] is the place in my receive buffer where the data coming from node n will begin.

  MPI_Allgatherv(&name, len, MPI_CHAR, incoming, sizes, displs, MPI_CHAR, MPI_COMM_WORLD);

  /*  MPI_Allgatherv's arguments are: 
        &name    = Address of data to send
        len      = How many items to send
        MPI_CHAR = The data type to send.
        incoming = Address to receive data into
        sizes    = Array of size mpi_size - size to receive from each node.
        displs   = Array of size mpi_size - starting point of data for each node. 
        MPI_CHAR = The data type to receive.
        MPI_COMM_WORLD = A reference to the universe.     */

And if you want to get the names out one by one, then, perhaps something like this...

  std::string allresults(incoming);
  std::string* array_results = new std::string[mpi_size];
  for (int i = 0; i < mpi_size; i++) {
    array_results[i] = allresults.substr(displs[i], sizes[i]);
  }

Don't forget to

  MPI_Finalize();

at the end.