13.5 Broadcast Communications

In this subsection, we will further improve the efficiency of our code by introducing two new MPI functions. In the process, we'll reduce the amount of code we have to work with.

13.5.1 Broadcast Functions

If you look back to the last solution, you'll notice that the parameters are sent individually to each process one at a time even though each process is receiving the same information. For example, if you are using 10 processes, while process 0 communicates with process 1, processes 2 through 10 are idle. While process 0 communicates with process 2, processes 3 through 10 are sill idle. And so on. This may not be a big problem with a half dozen processes, but if you are running on 1,000 machines, this can result in a lot of wasted time. Fortunately, MPI provides an alternative, MPI_Bcast.

13.5.1.1 MPI_Bcast

MPI_Bcast provides a mechanism to distribute the same information among a communication group or communicator. MPI_Bcast takes five arguments. The first three define the data to be transmitted. The first argument is the buffer that contains the data; the second argument is the number of items in the buffer; and the third argument, the data type. (The supported data types are the same as with MPI_Send, etc.)

The next argument is the rank of the process that is generating the broadcast, sometimes called the root of the broadcast. In our example, this is 0, but this isn't a requirement. All processes use identical calls to MPI_Bcast. By comparing their rank to the rank specified in the can, a process can determine whether it is sending or receiving data. Consequently, there is no need for any additional control structures with MPI_Bcast. The final argument is the communicator, which effectively defines which processes will participate in the broadcast. When the call returns, the data in the root's communications buffer will have been copied to each of the remaining processes in the communicator.

Here is our numerical integration code using MPI_Bcast (and MPI_Reduce, a function we will discuss next). New code appears in boldface.

#include "mpi.h"

#include <stdio.h>

   

/* problem parameters */

#define f(x)            ((x) * (x))

   

int main( int argc, char * argv[  ] )

{

   /* MPI variables */

   int noProcesses, processId;

   

   /* problem variables */

   int         i, numberRects;

   double      area, at, height, lower, width, total, range;

   double      lowerLimit, upperLimit;

   

   /* MPI setup */

   MPI_Init(&argc, &argv);

   MPI_Comm_size(MPI_COMM_WORLD, &noProcesses);

   MPI_Comm_rank(MPI_COMM_WORLD, &processId);

    

   if (processId = = 0)         /* if rank is 0, collect parameters */

   {

      fprintf(stderr, "Enter number of steps:\n");

      scanf("%d", &numberRects);

      fprintf(stderr, "Enter low end of interval:\n");

      scanf("%lf", &lowerLimit);

      fprintf(stderr, "Enter high end of interval:\n");

      scanf("%lf", &upperLimit);

   }

   

   MPI_Bcast(&numberRects, 1, MPI_INT, 0, MPI_COMM_WORLD);

   MPI_Bcast(&lowerLimit, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);

   MPI_Bcast(&upperLimit, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);

    

   /* adjust problem size for subproblem*/

   range = (upperLimit - lowerLimit) / noProcesses;

   width = range / numberRects;

   lower = lowerLimit + range * processId;

   

   /* calculate area for subproblem */

   area = 0.0;

   for (i = 0; i < numberRects; i++)

   {  at = lower + i * width + width / 2.0;

      height = f(at);

      area = area + width * height;

   }

   

   MPI_Reduce(&area, &total, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);

   

   /* collect information and print results */

   if (processId = = 0)         /* if rank is 0, print results */

   {  fprintf (stderr, "The area from %f to %f is: %f\n",

               lowerLimit, upperLimit, total );

   }

   

   /* finish */

   MPI_Finalize( );

   return 0;

}

Notice that we have eliminated the control structures as well as the need for separate MPI_Send and MPI_Recv calls.

13.5.1.2 MPI_Reduce

You'll also notice that we have used a new function, MPI_Reduce. The process of collecting data is so common that MPI includes functions that automate this process. The idea behind MPI_Reduce is to specify a data item to be accumulated, a storage location or variable to accumulate in, and an operator to use when accumulating. In this example, we want to add up all the individual areas, so area is the data to accumulate, total is the location where we accumulate the data, and the operation is adding or MPI_SUM.

More specifically, MPI_Reduce has seven arguments. The first two are the addresses of the send and receive buffers. The third is the number of elements in the send buffer, while the fourth gives the type of the data. Both send and receive buffers will manipulate the same number of elements which will be of the same type. The next operation identifies the function used to combine elements. MPI_SUM is used to add elements. MPI defines a dozen different operators. These include operators to find the sum of the data values (MPI_SUM), their product (MPI_PROD), the largest and smallest values (MPI_MAX and MPI_MIN), and numerous logical operations for both logical and bitwise comparisons using AND, OR, and XOR (MPI_LAND, MPI_BAND, MPI_LOR, MPI_BOR, MPI_LXOR, and MPI_BXOR). The data type must be compatible with the selected operation.

The next to the last argument identifies the root of the communications, i.e., the rank of the process that will accumulate the final answer, and the last argument is the communicator. These must have identical values in every process. Notice that only the root process will have the accumulated result. If all of the processes need the result, there is an analogous function MPI_Allreduce that is used in the same way.

Notice how the use of MPI_Reduce has simplified our code. We have eliminated a control structure, and, apart from the single parameter in our recall to MPI_Reduce, we no longer need to distinguish among processes. Keep in mind that it is up to the implementer to determine the best way to implement these functions. Details will vary. For example, the "broadcast" in MPI_Bcast simply means that the data is sent to all the processes. It does not necessarily imply that an Ethernet-style broadcast will be used, although that is one obvious implementation strategy. When implementing for other networks, other strategies may be necessary.

In this chapter we have introduced the six core MPI functions-MPI_Init, MPI_Comm_size, MPI_Comm_rank, MPI_Send, MPI_Recv, and MPI_Finalize-as well as several others that simplify MPI coding. These six core functions have been described as the six indispensable MPI functions, the functions that you really can't do without. On the other hand, most MPI programs, with a little extra work, could be rewritten with just these six functions. Congratulations! You are now an MPI programmer.

Table of Contents