Organization
This book is composed of 17 chapters, divided into four parts. The
first part addresses background material; the second part deals with
getting a cluster running quickly; the third part goes into more
depth describing how a custom cluster can be built; and the fourth
part introduces cluster programming.
Depending on your background and goals, different parts of this book
are likely to be of interest. I have tried to provide information
here and at the beginning of each section that should help you in
selecting those parts of greatest interest. You should not need to
read the entire book for it to be useful.
- Part I, An Introduction to Clusters
-
Chapter 1, is a general introduction to
high-performance computing from the perspective of clusters. It
introduces basic terminology and provides a description of various
high-performance technologies. It gives a broad overview of the
different cluster architectures and discusses some of the inherent
limitations of clusters. Chapter 2, begins with a discussion of how to
determine what you want your cluster to do. It then gives a quick
overview of the different types of software you may need in your
cluster.
Chapter 3, is a discussion of the hardware that
goes into a cluster, including both the individual computers and
network equipment.
Chapter 4, begins with a brief discussion of Linux
in general. The bulk of the chapter covers the basics of installing
and configuring Linux. This chapter assumes you are comfortable using
Linux but may need a quick review of some administrative tasks.
- Part II, Getting Started Quickly
-
Chapter 5, describes the installation,
configuration, and use of openMosix. It also reviews how to recompile
a Linux kernel. Chapter 6, describes installing and setting up
OSCAR. It also covers a few of the basics of using OSCAR.
Chapter 7, describes installing Rocks. It also
covers a few of the basics of using Rocks.
- Part III, Building Custom Clusters
-
Chapter 8, describes tools you can use to
replicate the software installed on one machine onto others. Thus,
once you have decided how to install and configure the software on an
individual node in your cluster, this chapter will show you how to
duplicate that installation on a number of machines quickly and
efficiently. Chapter 9, first describes programming software
that you may want to consider. Next, it describes the installation
and configuration of the software, along with additional utilities
you'll need if you plan to write the application
programs that will run on your cluster.
Chapter 10, describes tools you can use to manage
your cluster. Once you have a working cluster, you face numerous
administrative tasks, not the least of which is insuring that the
machines in your cluster are running properly and configured
identically. The tools in this chapter can make life much easier.
Chapter 11, describes OpenPBS, open source
scheduling software. For heavily loaded clusters,
you'll need software to allocate resources, schedule
jobs, and enforce priorities. OpenPBS is one solution.
Chapter 12, describes setting up and configuring
the Parallel Virtual File System (PVFS) software, a high-performance
parallel file system for clusters.
- Part IV, Cluster Programming
-
Chapter 13, is a tutorial on how to use the MPI
library. It covers the basics. There is a lot more to MPI than what
is described in this book, but that's a topic for
another book or two. The material in this chapter will get you
started. Chapter 14, describes some of the more advanced
features of MPI. The intent is not to make you proficient with any of
these features but simply to let you know that they exist and how
they might be useful.
Chapter 15, describes some techniques to break a
program into pieces that can be run in parallel. There is no silver
bullet for parallel programming, but there are several helpful ways
to get started. The chapter is a quick overview.
Chapter 16, first reviews the techniques used to
debug serial programs and then shows how the more traditional
approaches can be extended and used to debug parallel programs. It
also discusses a few problems that are unique to parallel programs.
Chapter 17, looks at techniques and tools that can
be used to profile parallel programs. If you want to improve the
performance of a parallel program, the first step is to find out
where the program is spending its time. This chapter shows you how to
get started.
- Part V, Appendix
-
The Appendix includes source information and
documentation for the software discussed in the book. It also
includes pointers to other useful information about clusters.
|