2.2 Determining Your Cluster's Mission

Defining what you want to do with the cluster is really the first step in designing it. For many clusters, the mission will be clearly understood in advance. This is particularly true if the cluster has a single use or a few clearly defined uses. However, if your cluster will be an open resource, then you'll need to anticipate potential uses. In that case, the place to start is with your users.

While you may think you have a clear idea of what your users will need, there may be little semblance between what you think they should need and what they think they need. And while your assessment may be the correct one, your users are still apt to be disappointed if the cluster doesn't live up to their expectations. Talk to your users.

You should also keep in mind that clusters have a way of evolving. What may be a reasonable assessment of needs today may not be tomorrow. Good design is often the art of balancing today's resources with tomorrow's needs. If you are unsure about your cluster's mission, answering the following questions should help.

2.2.1 What Is Your User Base?

In designing a cluster, you must take into consideration the needs of all users. Ideally this will include both the potential users as well as the obvious early adopters. You will need to anticipate any potential conflicting needs and find appropriate compromises.

The best way to avoid nasty surprises is to include representative users in the design process. If you have only a few users, you can easily poll the users to see what you need.

If you have a large user base, particularly one that is in flux, you will need to anticipate all reasonable, likely needs. Generally, this will mean supporting a wider range of software. For example, if you are the sole user and you only use one programming language and parallel programming library, there is no point in installing others. If you have dozens of users, you'll probably need to install multiple programming languages and parallel programming libraries.

2.2.2 How Heavily Will the Cluster Be Used?

Will the cluster be in constant use, with users fighting over it, or will it be used on an occasional basis as large problems arise? Will some of your jobs have higher priorities than others? Will you have a mix of jobs, some requiring the full capabilities of the cluster while others will need only a subset?

If you have a large user base with lots of potential conflicts, you will need some form of scheduling software. If your cluster will be lightly used or have very few users who are willing to work around each other, you may be able to postpone installing scheduling software.

2.2.3 What Kinds of Software Will You Run on the Cluster?

There are several levels at which this question can be asked. At a cluster management level, you'll need to decide which systems software you want, e.g., BSD, Linux, or Windows, and you'll need to decide what clustering software you'll need. Both of these choices will be addressed later in this chapter.

From a user perspective, you'll need to determine what application-level software to use. Will your users be using canned applications? If so, what are these applications and what are their requirements? Will your users be developing software? If so, what tools will they need? What is the nature of the software they will write and what demands will this make on your cluster? For example, if your users will be developing massive databases, will you have adequate storage? Will the I/O subsystem be adequate? If your users will carry out massive calculations, do you have adequate computational resources?

2.2.4 How Much Control Do You Need?

Closely related to the types of code you will be running is the question of how much control you will need over the code. There are a range of possible answers. If you need tight control over resources, you will probably have to write your own applications. User-developed code can make explicit use of the available resources.

For some uses, explicit control isn't necessary. If you have calculations that split nicely into separate processes and you'd just like them to run faster, software that provides transparent control may be the best solution. For example, suppose you have a script that invokes a file compression utility on a large number of files. It would be convenient if you could divide these file compression tasks among a number of processes, but you don't care about the details of how this is done.

openMosix, code that extends the Linux kernel, provides this type of transparent support. Processes automatically migrate among cluster computers. The advantage is that you may need to rewrite user code. However, the transparent control provided by openMosix will not work if the application uses shared memory or runs as a single process.

2.2.5 Will This Be a Dedicated or Shared Cluster?

Will the machines that comprise the cluster be dedicated to the cluster, or will they be used for other tasks? For example, a number of clusters have been built from office machines. During the day, the administrative staff uses the machines. In the evening and over the weekend, they are elements of a cluster. University computing laboratories have been used in the same way.

Obviously, if you have a dedicated cluster, you are free to configure the nodes as you see fit. With a shared cluster, you'll be limited by the requirements of the computers' day jobs. If this is the case, you may want to consider whether a dual-boot approach is feasible.

2.2.6 What Resources Do You Have?

Will you be buying equipment or using existing equipment? Will you be using recycled equipment? Recycled equipment can certainly reduce your costs, but it will severely constrain what you can do. At the very least, you'll need a small budget to adapt and maintain the equipment you have. You may need to purchase networking equipment such as a switch and cables, or you may need to replace failing parts such as disk drives and network cards. (See Chapter 3 for more information about hardware.)

2.2.7 How Will Cluster Access Be Managed?

Will you need local or remote access or both? Will you need to provide Internet access, or can you limit it to the local or campus network? Can you isolate the cluster? If you must provide remote access, what will be the nature of that access? For example, will you need to install software to provide a graphical interface for remote users? If you can isolate your network, security becomes less of an issue. If you must provide remote access, you'll need to consider tools like SSH and VNC. Or is serial port access by a terminal server sufficient?

2.2.8 What Is the Extent of Your Cluster?

The term cluster usually applies to computers that are all on the same subnet. If you will be using computers on different networks, you are building a grid. With a grid you'll face greater communications overhead and more security issues. Maintaining the grid will also be more involved and should be addressed early in the design process. This book doesn't cover the special considerations needed for grids.

2.2.9 What Security Concerns Do You Have?

Can you trust your users? If the answer is yes, this greatly simplifies cluster design. You can focus on controlling access to the cluster. If you can't trust your users, you'll need to harden each machine and develop secure communications. A closely related question is whether you can control physical access to your computers. Again, controlling physical access will simplify securing your cluster since you can focus on access points, e.g., the head node rather than the cluster as a whole. Finally, do you deal with sensitive data? Often the value of the data you work with determines the security measures you must take.

Table of Contents