5.5 Using openMosix

At its simplest, openMosix is transparent to the user. You can sit back and reap the benefits. But at times, you'll want more control. At the very least, you may want to verify that it is really running properly. (You could just time applications with computers turned on and off, but you'll probably want to be a little more sophisticated than that.) Fortunately, openMosix provides some tools that allow you to monitor and control various jobs. If you don't like the tools that come with openMosix, you can always install other tools such as openMosixView.

5.5.1 User Tools

You should install the openMosix user tools before you start running openMosix. This package includes several useful management tools (migrate, mosctl, mosmon, mosrun, and setpe), an openMosix aware version of ps and top called, suitably, mps and mtop, and a startup script /etc/init.d/openmosix. (This is actually a link to the file /etc/rc.d/init.d/openmosix.)

5.5.1.1 mps and mtop

Both mps and mtop will look a lot like their counterparts, ps and top. The major difference is that each has an additional column that gives the node number on which a process is running. Here is part of the output from mps:

[root@fanny sloanjd]# mps

  PID TTY NODE STAT TIME COMMAND

...

19766  ?     0 R    2:32 ./loop 

19767  ?     2 S    1:45 ./loop 

19768  ?     5 S    3:09 ./loop 

19769  ?     4 S    2:58 ./loop 

19770  ?     2 S    1:47 ./loop 

19771  ?     3 S    2:59 ./loop 

19772  ?     6 S    1:43 ./loop 

19773  ?     0 R    1:59 ./loop 

...

As you can see from the third column, process 19769 is running on node 4. It is important to note that mps must be run on the machine where the process originated. You will not see the process if you run ps, mps, top, or mtop on any of the other machines in the cluster even if the process has migrated to that machine. (Arguably, in this respect, openMosix is perhaps a little too transparent. Fortunately, a couple of the other tools help.)

5.5.1.2 migrate

The tool migrate explicitly moves a process from one node to another. Since there are circumstances under which some processes can't migrate, the system may be forced to ignore this command. You'll need the PID and the node number of the destination machine. Here is an example:

[sloanjd@fanny sloanjd]$ migrate 19769 5

This command will move process 19769 to node number 5. (You can use home in place of the node number to send a process back to the CPU where it was started.) It might be tempting to think you are reducing the load on node number 4, the node where the process was running, but in a balanced system with no other action, another process will likely migrate to node 4.

5.5.1.3 mosctl

With mosctl, you have greater control over how processes are run on individual machines. For example, you can block the arrival of guest processes to lighten the load on a machine. You can use mosctl with the setspeed option to override a node's idea of its own speed. This can be used to attract or discourage process migration to the machine. mosctl can also be used to display utilization or tune openMosix performance parameters. There are too many arguments to go into here, but they are described in the manpage.

5.5.1.4 mosmon

While mps won't tell you if a process has migrated to your machine, you can get a good idea of what is going across the cluster with the mosmon utility. mosmon is an ncurses-based utility that will display a simple bar graph showing the loads on the nodes in your cluster. This can give you a pretty good idea of what is going on. Figure 5-1 shows mosmon in action.

Figure 5-1. mosmon

In this example, eight identical processes are running on a six-node cluster. Obviously, the second and sixth nodes have two processes each while the remaining four machines are each running a single process. Of course, other processes could be mixed into this, affecting an individual machine's load. You can change the view to display memory, speed, and utilization as well as change the layout of the graph. Press h while the program is running to display the various options. Press q to quit the program.

Incidentally, mosmon goes by several different names, including mon and, less commonly, mmon. The original name was mon, and it is often referred to by that name in openMosix documentation. The shift to mosmon was made to eliminate a naming conflict with the network-monitoring tool mon. The local name is actually set by a compile-time variable.

5.5.1.5 mosrun

The mosrun command can also be used to advise the system to run a specific program on a specified node. You'll need the program name and the destination node number (or use -h for the home node). Actually, mosrun is one of a family of commands used to control node allocation preferences. These are listed and described on the manpage for mosrun.

5.5.1.6 setpe

The setpe command can be used to manually configure a node. (In practice, setpe is usually called from the script /etc/init.d/openmosix rather than used directly.) As root, you can use setpe to start or stop openMosix. For example, you could start openMosix with a specific configuration file with a command like

[root@ida sloanjd]# /sbin/setpe -w -f /etc/openmosix.map

setpe takes several options including -r to read the configuration file, -c to check the map's consistency, and -off to shut down openMosix. Consult the manpage for more information.

5.5.2 openMosixView

openMosixView extends the basic functionality of the user tools while providing a spiffy X-based GUI. However, the basic user tools must be installed for openMosixView to work. openMosixView is actually seven applications that can be invoked from the main administration application.

If you want to install openMosixView, which is strongly recommended, download the package from http://www.openmosixview.com. Look over the documentation for any dependencies that might apply. Depending on what you have already installed on your system, you may need to install additional packages. For example, GLUT is one of more than two dozen dependences. Fortunately (or annoyingly), rpm will point out to you what needs to be added.

Then, as root, install the appropriate packages.

[root@fanny root]# rpm -vih glut-3.7-12.i386.rpm

warning: glut-3.7-12.i386.rpm: V3 DSA signature: NOKEY, key ID db42a60e

Preparing...                ########################################### [100%]

   1:glut                   ########################################### [100%]

[root@fanny root]# rpm -vih openmosixview-1.5-redhat90.i386.rpm

Preparing...                ########################################### [100%]

   1:openmosixview          ########################################### [100%]

As with the kernel, you'll want to repeat this on every node. This installation will install documentation in /usr/local.

Once installed, you are basically ready to run. However, by default, openMosixView uses RSH. It is strongly recommended that you change this to SSH. Make sure you have SSH set up on your system. (See Chapter 4 for more information on SSH.) Then, from the main application, select the Config menu.

The main applications window is shown in Figure 5-2. You get this by running the command openmosixview in an X window environment.

Figure 5-2. openMosixView

This view displays information for each of the five nodes in this cluster. The first column displays the node's status by node number. The background color is green if the node is available or red if it is unavailable. The second column, buttons with IP numbers, allows you to configure individual systems. If you click on one of these buttons, a pop-up window will appear for that node, as shown in Figure 5-3. You'll notice that the configuration options are very similar to those provided by the mosctl command.

Figure 5-3. openMosix configuration window

As you can see from the figure, you can control process migration, etc., with this window. The third column in Figure 5-2, the sliders, controls the node efficiencies used by openMosix when load balancing. By changing these, you alter openMosix's idea of the relative efficiencies of the nodes in the cluster. This in turn influences how jobs migrate. Note that the slider settings do not change the efficiency of the node, just openMosix's perception of the node's capabilities. The remaining columns provide general information about the nodes. These should be self-explanatory.

The buttons along the top provide access to additional applications. For example, the third button, which looks like a gear, launches the process viewer openMosixprocs. This is shown in Figure 5-4.

Figure 5-4. openMosixprocs

openMosixprocs allows you to view and manage individual processes started on the node from which openMosixprocs is run. (Since it won't show you processes migrated from other systems, you'll need openMosixprocs on each node.) You can select a user in the first entry field at the top of the window and click on refresh to focus in on a single user's processes. By double-clicking on an individual process, you can call up the openMosixprocs-Migrator, which will provide additional statistics and allow some control of a process.

openMosixView provides a number of additional tools that aren't described here. These include a 3D process viewer (3dmosmon), a data collection daemon (openMosixcollector), an analyzer (openMosixanalyzer), an application for viewing process history (openMosixHistory), and a migration monitor and controller (openMosixmigmon) that supports drag-and-drop control on process migration.

5.5.3 Testing openMosix

It is unlikely that you will have any serious problems setting up openMosix. But you may want to confirm that it is working. You could just start a few processes and time them with openMosix turned on and off. Here is the simple C program that can be used to generate some activity.

#include <stdio.h>

   

int foo(int,int);

int main( void )

{

    int i,j;

   

    for (i=1; i<100000; i++)

        for (j=1; j<100000; j++)

            foo(i,j);

   

    return 0;

}

   

int foo(int x, int y)

{

        return(x+y);

}

This program does nothing useful, but it will take several minutes to complete on most machines. (You can adjust the loop count if it doesn't run long enough to suit you.) By compiling this (without optimizations) and then starting several copies running in the background, you'll have a number of processes you can watch.

While timing will confirm that you are actually getting a speedup, you'll get a better idea of what is going on if you run mosmon. With mosmon, you can watch process migration and load balancing as it happens.

If you are running a firewall on your machines, the most likely problem you will have is getting connection privileges correct. You may want to start by disconnecting your cluster from the Internet and disabling the firewall. This will allow you to confirm that openMosix is correctly installed and that the firewall is the problem. You can use the command netstat -a to identify which connections you are using. This should give you some guidance in reconfiguring your firewall.

Finally, an openMosix stress test is available for the truly adventurous. It can be downloaded from http://www.openmosixview.com/omtest/. This web page also describes the test (actually a test suite) and has a link to a sample report. You can download sources or an RPM. You'll need to install expect before installing the stress test. To run the test, you should first change to the /usr/local/omtest directory and then run the script ./openmosix_stress_test.sh. A report is saved in the /tmp directory.

The test takes a while to run and produces a very long report. For example, it took over an hour and a half on an otherwise idle five-node cluster of Pentium II's and produced an 18,224-line report. While most users will find this a bit of overkill for their needs, it is nice to know it is available. Interpretation of the results is beyond the scope of this book.

Table of Contents