8.1 Configuring Systems

Cloning refers to creating a number of identical systems. In practice, you may not always want systems that are exactly alike. If you have several different physical configurations, you'll need to adapt to match the hardware you have. It would be pointless to use the identical partitioning schemes on hard disks with different capacities. Furthermore, each system will have different parameters, e.g., an IP address or host name that must be unique to the system.

Setting up a system can be divided roughly into two stages-installing the operating system and then customizing it to fit your needs. This division is hazy at best. Configuration changes to the operating system could easily fall into either category. Nonetheless, many tools and techniques fall, primarily, into one of these stages so the distinction is helpful. We'll start with the second task first since you'll want to keep this ongoing process in mind when looking at tools designed for installing systems.

8.1.1 Distributing Files

The major part of the post-install configuration is getting the right files onto your system and keeping those files synchronized. This applies both to configuring the machine for the first time and to maintaining existing systems. For example, when you add a new user to your cluster, you won't want to log onto every machine in the cluster and repeat the process. It is much simpler if you can push the relevant accounting files to each machine in the cluster from your head node.

What you will want to copy will vary with your objectives, but Table 8-1 lists a few likely categories.

Table 8-1.

Types of Files

Accounting files, e.g., /etc/passwd, /etc/shadow, /etc/group, /etc/gshadow

Configuration files, e.g., /etc/motd, /etc/fstab, /etc/hosts, /etc/printcap.local

Security configuration files such as firewall rulesets or public keys

Packages for software you wish to install

Configuration files for installed software

User scripts

Kernal images and kernal source files

Many of these are one-time copies, but others, like the accounting files, will need to be updated frequently.

You have a lot of options. Some approaches work best when moving sets of files but can be tedious when dealing with just one or two files. If you are dealing with a number of files, you'll need some form of repository. (While you could pack a collection of files into a single file using tar, this approach works well only if the files aren't changing.) You could easily set up your own HTTP or FTP server for both packages and customized configuration files, or you could put them on a floppy or CD and carry the disk to each machine. If you are putting together a repository of files, perhaps the best approach is to use NFS.

With NFS, you won't need to copy anything. But while this works nicely with user files, it can create problems with system files. For example, you may not want to mount a single copy of /etc using NFS since, depending on your flavor of Linux, there may be files in the /etc that are unique to each machine, e.g., /etc/HOSTNAME. The basic problem with NFS is that the granularity (a directory) is too coarse. Nonetheless, NFS can be used as a first step in distributing files. For example, you might set up a shared directory with all the distribution RPMs along with any other software you want to add. You can then mount this directory on the individual machines. Once mounted, you can easily copy files where you need them or install them from that directory. For packages, this can easily be done with a shell script.

While any of these approaches will work and are viable approaches on an occasional basis, they are a little clunky, particularly if you need to move only a file or two. Fortunately, there are also a number of commands designed specifically to move individual files between machines. If you have enabled the r-service commands, you could use rcp. A much better choice is scp, the SSH equivalent. You could also consider rdist. Debian users should consider apt-get. cpush, one of the tools supplied in C3 and described in Chapter 10, is another choice. One particularly useful command is rsync, which will be described next.

8.1.1.1 Pushing files with rsync

rsync is GNU software written by Andrew Tridgell and Paul Mackerras. rsync is sometimes described as a faster, more flexible replacement for rcp, but it is really much more. rsync has several advantages. It can synchronize a set of files very quickly because it sends only the difference in the files over the link. It can also preserve file settings. Finally, since other tools described later in this book such as SystemImager and C3 use it, a quick review is worthwhile.

rsync is included in most Linux distributions. It is run as a client on the local machine and as a server on the remote machine. With most systems, before you can start the rsync daemon on the machine that will act as the server, you'll need to create both a configuration file and a password file.^[1]

^[1] Strictly speaking, the daemon is unnecessary if you have SSH or RSH.

A configuration file is composed of optional global commands followed by one or more module sections. Each module or section begins with a module name and continues until the next module is defined. A module name associates a symbolic name to a directory. Modules are composed of parameter assignments in the form option = value. An example should help clarify this.

# a sample rsync configuration file -- /etc/rsyncd.conf

#

[systemfiles]

# source/destination directory for files

path = /etc

# authentication -- users, hosts, and password file

auth users = root, sloanjd

hosts allow = amy basil clara desmond ernest fanny george hector james

secrets file = /etc/rsyncd.secrets

# allow read/write

read only = false

# UID and GID for transfer

uid = root

gid = root

There are no global commands in this example, only the single module [systemfiles]. The name is an arbitrary string (hopefully not too arbitrary) enclosed in square brackets. For each module, you must specify a path option, which identifies the target directory on the server accessed through the module.

The default is for files to be accessible to all users without a password, i.e., anonymous rsync. This is not what we want, so we use the next three commands to limit access. The auth user option specifies a list of users that can access a module, effectively denying access to all other users. The hosts allow option limits the machines that can use this module. If omitted, then all machines will have access. In place of a list of machines, an address/mask pattern can be used. The secrets file specifies the name of a password file used for authentication. The file is used only if the auth user option is also used. The format of the secrets file is user:password, one entry per line. Here is an example:

root:RSpw012...

The secrets file should be readable only by root, and should not be writable or executable. rsync will balk otherwise.

By default, files are read only; i.e., files can be downloaded from the server but not uploaded to the server. Set the read only option to false if you want to allow writing, i.e., uploading files from clients to the server. Finally, the uid and gid options set the user and group identities for the transfer. The configuration file is described in detail in the manpage rsyncd.conf(5). As you might imagine, there are a number of other options not described here.

rsync usually uses rsh or ssh for communications (although it is technically possible to bypass these). Consequently, you'll need to have a working version of rsh or ssh on your system before using rsync.

To move files between machines, you will issue an rsync command on a local machine, which will contact an rsync daemon on a remote machine. Thus, to move files rsync must be installed on each client and the remote server must be running the rsync daemon. The rsync daemon is typically run by xinetd but can be run as a separate process if it is started using the --daemon option. To start rsync from xinetd, you need to edit the file /etc/xinetd.d/rsync, change the line disable = yes to disable = no, and reinitialize or restart xinetd. You can confirm it is listening by using netstat.

[root@fanny xinetd.d]# netstat -a | grep rsync

tcp        0      0 *:rsync                 *:*                     LISTEN

rsync uses TCP port 873 by default.

rsync can be used in a number of different ways. Here are a couple of examples to get you started. In this example, the file passwd is copied from fanny to george while preserving the group, owner, permissions, and time settings for the file.

[root@fanny etc]# rsync -gopt passwd george::systemfiles

Password:

Recall systemfiles is the module name in the configuration file. Note that the system prompts for the password that is stored in the /etc/rsyncd.secrets file on george. You can avoid this step (useful in scripts) with the --password-file option. This is shown in the next example when copying the file shadow.

[root@fanny etc]# rsync -gopt --password-file=rsyncd.secrets shadow /

george::systemfiles

If you have the rsync daemon running on each node in your cluster, you could easily write a script that would push the current accounting files to each node. Just be sure you get the security right.

In the preceding examples, rsync was used to push files. It can also be used to pull files. (fanny has the same configuration files as george.)

[root@george etc]# rsync -gopt fanny::systemfiles/shadow /etc/shadow

Notice that the source file is actually /etc/shadow but the /etc is implicit because it is specified in the configuration file.

rsync is a versatile tool. It is even possible to clone running systems with rsync. Other command forms are described in the manpage rsync(1).

Table of Contents