5.2 How openMosix Works

openMosix originated as a fork from the earlier MOSIX (Multicomputer Operating System for Unix) project. The openMosix project began when the licensing structure for MOSIX moved away from a General Public License. Today, it has evolved into a project in its own right. The original MOSIX project is still quite active under the direction of Amnon Barak (http://www.mosix.org). openMosix is the work of Moshe Bar, originally a member of the MOSIX team, and a number of volunteers. This book focuses on openMosix, but MOSIX is a viable alternative that can be downloaded at no cost.

As noted in Chapter 1, one approach to sharing a computation between processors in a single-enclosure computer with multiple CPUs is symmetric multiprocessor (SMP) computing. openMosix has been described, accurately, as turning a cluster of computers into a virtual SMP machine, with each node providing a CPU. openMosix is potentially much cheaper and scales much better than SMPs, but communication overhead is higher. (openMosix will work with both single-processor systems and SMP systems.) openMosix is an example of what is sometimes called single system image clustering (SSI) since each node in the cluster has a copy of a single operating system kernel.

The granularity for openMosix is the process. Individual programs, as in the compression example, may create the processes, or the processes may be the result of different forks from a single program. However, if you have a computationally intensive task that does everything in a single process (and even if multiple threads are used), then, since there is only one process, it can't be shared among processors. The best you can hope for is that it will migrate to the fastest available machine in the cluster.

Not all processes migrate. For example, if a process only lasts a few seconds (very roughly, less than 5 seconds depending on a number of factors), it will not have time to migrate. Currently, openMosix does not work with multiple processes using shared writable memory, such as web servers.^[1] Similarly, processes doing direct manipulation of I/O devices won't migrate. And processes using real-time scheduling won't migrate. If a process has already migrated to another processor and attempts to do any these things, the process will migrate back to its unique home node (UHN), the node where the process was initially created, before continuing.

^[1] Actually, the migration of shared memory (MigSHM) patch is an openMosix patch that implements shared memory migration. At the time this was written, it was not part of the main openMosix tree. (Visit http://mcaserta.com/maask/.)

To support process migration, openMosix divides processes into two parts or contexts. The user context contains the program code, stack, data, etc., and is the part that can migrate. The system context, which contains a description of the resources the process is attached to and the kernel stack, does not migrate but remains on the UHN.

openMosix uses an adaptive resource allocation policy. That is, each node monitors and compares its own load with the loads on a portion of the other computers within the cluster. When a computer finds a more lightly loaded computer (based on the overall capacity of the computer), it will attempt to migrate a process to the more lightly loaded computer, thereby creating a more balanced load between the two. As the loads on individual computers change, e.g., when jobs start or finish, processes will migrate among the computers to rebalance loads across the cluster, adapting dynamically to the changes in loads.

Individual nodes, acting as autonomous systems, decide which processes migrate. The communications among small sets of nodes within the cluster used to compare loads is randomized. Consequently, clusters scale well because of this random element. Since communications is within subsets in the cluster, nodes have limited but recent information about the state of the whole cluster. This approach reduces overhead and communication.

While load comparison and process migration are generally automatic within a cluster, openMosix provides tools to control migration. It is possible to alter the cluster's perception of how heavily an individual computer is loaded, to tie processes to a specific computer, or to block the migration of processes to a computer. However, precise control for the migration of a group of processes is not practical with openMosix at this time.^[2]

^[2] This issue is addressed by a patch that allows the creation of process groups, available at http://www.openmosixview.com/miggroup/.

The openMosix API uses the values in the flat files in /proc/hpc to record and control the state of the cluster. If you need information about the current configuration, want to do really low-level management, or write management scripts, you can look at or write to these files.

Table of Contents