17.6 MPE

If gprof and gcov seem too complicated for routine use, or if you just want to investigate all your possibilities, there is another alternative you can consider-Multi-Processing Environment (MPE). If you built MPICH manually on your cluster, you already have MPE. If you installed MPICH as part of OSCAR, you'll need to add MPE. Fortunately, this is straightforward and is described in Chapter 9. Although MPE is supplied with MPICH, it can be used with other versions of MPI.

MPE provides several useful resources. First and foremost, it includes several libraries useful to MPI programmers. These include a library of routines that create logfiles for profiling MPI programs. It also has a tracing library and a real-time animation library that are useful when analyzing code. MPE also provides a parallel X graphics library. There are routines than can be used to ensure that a section of code is run sequentially. There are also debugger setup routines. While this section will focus on using logfiles to profile MPI program performance, remember that this other functionality is available should you need it.

MPE's logging capabilities can generate three different logfile formats-ALOG, CLOG, and SLOG. ALOG is an older ASCII-based format that is now deprecated. CLOG is the current default format, while SLOG is an emerging standard. Unlike SLOG, CLOG does not scale well and should be avoided for large files.

MPE includes four graphical visualization tools that allow you to examine the logfiles that MPE creates, upshot, nupshot, jumpshot-2, and jumpshot-3. The primary differences between these four tools are the file formats they read and their implementation languages.

upshot: This tool reads and displays ALOG files and is implemented in Tcl/Tk.
nushot: This tool reads and displays CLOG files. Because it is implemented in an older version of Tcl/Tk, it is not automatically installed.
jumpshot-2: This tool reads and displays CLOG files and is implemented in Java 1.1. (Unlike jumpshot-3, jumpshot-2 is not compatible with newer versions of Java.)
jumpshot-3: This tool reads and displays SLOG files and is implemented in Java.

To build each of these, you will need the appropriate version of TCL/TK or Java on your system.

Finally, MPE provides several utilities that simplify dealing with logfiles.

clog2slog: This utility that converts CLOG files into SLOG files.
clog2alog: This converts from CLOG to ALOG format.
slog_print and clog_print: These print programs for SLOG and CLOG files, respectively.
viewers: This utility invokes the appropriate visualization tool needed to display a logfile based on its format.

There are two basic approaches to generating logfiles with MPE. When you link to the appropriate MPE library, logfiles will be generated automatically using the PMPI profiling interface described earlier in this chapter. Alternatively, you can embed MPE commands in a program to manually collect information. It is also possible to combine these approaches in a single program.

17.6.1 Using MPE

In order to use MPE, you'll need to link your programs to the appropriate libraries. Since MPE has been integrated into the MPICH distribution, using MPICH is the easiest way to go because MPICH provides compiler flags that simplify compilation.

If you are using another version of MPI, instead of or in addition to MPICH, your first order of business will be locating the MPE libraries on your system and ensuring they are on your compile/link paths, typically /usr/local/lib. If in doubt, use whereis to locate one of the libraries. They should all be in the same place.

[sloanjd@amy sloanjd]$ whereis libmpe.a

libmpe: /usr/local/lib/libmpe.a

Once you've got your path set correctly, using MPE shouldn't be difficult.

MPICH includes several demonstration programs, so you may find it easier if you test things out with these rather than with one of your own programs. In the next two examples, I'm using cpi.c and cpilog.c, which are found in the examples directory under the MPICH source tree. cpi.c is an ordinary MPI program that estimates the value of . It does not contain any MPE commands. We'll use it to see how the automatic profiling library works.

To compile cpi.c under MPICH, use the -mpilog compiler flag.

[sloanjd@amy MPEDEMO]$ mpicc cpi.c -mpilog -o cpi

It is only slightly more complicated with LAM/MPI. You'll need to be sure that the libraries can be found and you'll need to explicitly link both libraries, liblmpe.a and libmpe.a as shown:

[sloanjd@amy MPEDEMO]$ mpicc cpi.c -llmpe -lmpe -o cpi

(Be sure you link them in the order shown.)

When you run the program, you'll notice that a logfile is created.

[sloanjd@amy MPEDEMO]$ mpirun -np 4 cpi

Process 0 of 4 on amy

pi is approximately 3.1415926544231239, Error is 0.0000000008333307

wall clock time = 0.005883

Writing logfile.

Finished writing logfile.

Process 2 of 4 on oscarnode2.oscardomain

Process 1 of 4 on oscarnode1.oscardomain

Process 3 of 4 on oscarnode3.oscardomain

By default, a CLOG file will be created. You can change the default behavior by setting the environment variable MPE_LOG_FORMAT.^[3] For example,

^[3] While setting MPE_LOG_FORMAT works fine with MPICH, it doesn't seem to work with LAM/MPI.

[sloanjd@amy MPEDEMO]$ export MPE_LOG_FORMAT=SLOG

You can view the CLOG file directly with jumpshot-2, or you can convert it to a SLOG file with clog2slog utility and then view it with jumpshot-3. I'll use the latter approach since I haven't installed jumpshot-2 on this system.

[sloanjd@amy MPEDEMO]$ clog2slog cpi.clog

[sloanjd@amy MPEDEMO]$ jumpshot cpi.slog

Remember that you'll need to execute that last command in an X Window System environment.

jumpshot-3 opens three windows. The first is the main window for jumpshot-3, which you can use to open other logfiles and change program defaults. If you close it, the other jumpshot-3 windows will all close as well. See Figure 17-1.

Figure 17-1. Main Jumpshot-3 window

The next window to open will be the legend. This gives the color code for the data display window, which opens last. See Figure 17-2.

Figure 17-2. Legend

Since cpi.c only uses two MPI commands, only two are shown. If other MPI functions had been used in the program, they would have been added to the window. If the colored bullets are not visible when the window opens, which is often the case, just resize the window and they should appear.

The last window, the View & Frame Selector, displays the actual profile information. The graph is organized vertically by process and horizontally by time. Once you have this window open, you can use the options it provides to alter the way your data is displayed. See Figure 17-3.

Figure 17-3. View & Frame Selector

You can find an introductory tutorial on jumpshot-3 under the MPICH source tree in the directory mpe/viewers/jumpshot-3/doc. Both PDF and HTML versions are included.

As noted earlier, if you want more control over how your program is profiled, you can embed MPE profiling commands directly into the code. With MPICH, you'll compile it in exactly the same way, using the -mpilog flag. With LAM/MPI, you only need to link to the libmpe.a library.

[sloanjd@amy MPEDEMO]$ mpicc cpilog.c -lmpe -o cpilog

The file cpilog.c, compiled here, is an MPE demonstration program that includes embedded MPE commands. An explanation of these commands and an example are given in the next subsection of this chapter.

Before we leave compiling MPE programs, it is worth mentioning the other MPE libraries that are used in much the same way. With MPICH, the compiler flag -mpianim is used to link to the animation library, while the flag -mpitrace is used to link to the trace library. With LAM/MPI, you'll need to link these directly when you compile. For example, to use the trace library libtmpe.a, you might enter

[sloanjd@amy MPEDEMO]$ mpicc cpi.c -ltmpe -o cpi

With the trace library you'll get a trace printout for all MPI calls when you run the program. Here is a partial listing for cpi.c:

[sloanjd@amy MPEDEMO]$ mpirun -np 4 cpi

Starting MPI_Init...

Starting MPI_Init...

Starting MPI_Init...

Starting MPI_Init...

[0] Ending MPI_Init

[1] Ending MPI_Init

[2] Ending MPI_Init

[3] Ending MPI_Init

[1] Starting MPI_Comm_size...

[2] Starting MPI_Comm_size...

[3] Starting MPI_Comm_size...

[1] Ending MPI_Comm_size

[2] Ending MPI_Comm_size

[3] Ending MPI_Comm_size

[1] Starting MPI_Comm_rank...

[2] Starting MPI_Comm_rank...

[3] Starting MPI_Comm_rank...

[1] Ending MPI_Comm_rank

[2] Ending MPI_Comm_rank

[3] Ending MPI_Comm_rank

[2] Starting MPI_Get_processor_name...

[1] Starting MPI_Get_processor_name...

[3] Starting MPI_Get_processor_name...

[1] Ending MPI_Get_processor_name

[2] Ending MPI_Get_processor_name

[3] Ending MPI_Get_processor_name

Process 1 of 4 on oscarnode1.oscardomain

Process 2 of 4 on oscarnode2.oscardomain

Process 3 of 4 on oscarnode3.oscardomain

[2] Starting MPI_Bcast...

[1] Starting MPI_Bcast...

[3] Starting MPI_Bcast...

...

There's a lot more output that has been omitted. As you can see, the program output is interspersed with the trace. The number in the square bracket is the process.

Table of Contents