Previous Section Table of Contents Next Section

1.4 Limitations

While clusters have a lot to offer, they are not panaceas. There is a limit to how much adding another computer to a problem will speed up a calculation. In the ideal situation, you might expect a calculation to go twice as fast on two computers as it would on one. Unfortunately, this is the limiting case and you can only approach it.

Any calculation can be broken into blocks of code or instructions that can be classified in one of two exclusive ways. Either a block of code can be parallelized and shared among two or more machines, or the code is essentially serial and the instructions must be executed in the order they are written on a single machine. Any code that can't be parallelized won't benefit from any additional processors you may have.

There are several reasons why some blocks of code can't be parallelized and must be executed in a specific order. The most obvious example is I/O, where the order of operations is typically determined by the availability, order, and format of the input and the format of the desired output. If you are generating a report at the end of a program, you won't want the characters or lines of output printed at random.

Another reason some code can't be parallelized comes from the data dependencies within the code. If you use the value of x to calculate the value of y, then you'll need to calculate x before you calculate y. Otherwise, you won't know what value to use in the calculation. Basically, to be able to parallelize two instructions, neither can depend on the other. That is, the order in which the two instructions finish must not matter.

Thus, any program can be seen as a series of alternating sections-sections that can be parallelized and effectively run on different machines interspersed with sections that must be executed as written and that effectively can only be run on a single machine. If a program spends most of its time in code that is essentially serial, parallel processing will have limited value for this code. In this case, you will be better served with a faster computer than with parallel computers. If you can't change the algorithm, big iron is the best approach for this type of problem.

1.4.1 Amdahl's Law

As just noted, the amount of code that must be executed serially limits how much of a speedup you can expect from parallel execution. This idea has been formalized by what is known as Amdahl's Law, named after Gene Amdahl, who first stated the law in the late sixties. In a nutshell, Amdahl's Law states that the serial portion of a program will be the limiting factor in how much you can speed up the execution of the program using multiple processors.[2]

[2] While Amdahl's Law is the most widely known and most useful metric for describing parallel performance, there are others. These include Gustafson-Barsus's, Sun's, and Ni's Laws and the Karp-Flat and the Isoefficiency Metrics.

An example should help clarify Amdahl's Law. Let's assume you have a computation that takes 10 hours to complete on a currently available computer and that 90 percent of your code can be parallelized. In other words, you are spending one hour doing instructions that must be done serially and nine hours doing instructions that can be done in parallel. Amdahl's Law states that you'll never be able to run this code on this class of computers in less than one hour, regardless of how many additional computers you have available. To see this, imagine that you had so many computers that you could execute all the parallel code instantaneously. You would still have the serial code to execute, which has to be done on a single computer, and it would still take an hour.[3]

[3] For those of you who love algebra, the speedup factor is equal to 1/(s + p/N), where s is the fraction of the code that is inherently serial, p is the fraction of the code that can be parallelized, and N is the number of processors available. Clearly, p + s = 1. As the number of processors becomes very large, p/N becomes very small, and the speedup becomes essentially 1/s. So if s is 0.1, the largest speedup you can expect is a factor of 10, no matter how many processors you have available.

In practice, you won't have an unlimited number of processors, so your total time will always be longer. Figure 1-6 shows the amount of time needed for this example, depending on the number of processors you have available.

Figure 1-6. Execution time vs. number of processors
figs/hplc_0106.gif


You should also remember that Amdahl's law is an ideal. In practice, there is the issue of the overhead introduced by parallelizing the code. For example, coordinating communications among the various processes will require additional code. This adds to the overall execution time. And if there is contention for the network, this can stall processes, further slowing the calculation. In other words, Amdahl's Law is the best speedup you can hope for, but not the actual speedup you'll see.

What can you do if you need to do this calculation in less than one hour? As I noted earlier, you have three choices when you want to speed up a calculation-better algorithms, faster computers, or more computers. If more computers won't take you all the way, your remaining choices are better algorithms and faster computers. If you can rework your code so that a larger fraction can be done in parallel, you'll see an increased benefit from a parallel approach. Otherwise, you'll need to dig deep into your pocket for faster computers.

Surprisingly, a fair amount of controversy still surrounds what should be obvious once you think about it. This stems in large part from the misapplication of Amdahl's Law over the years. For example, Amdahl's Law has been misused as an argument favoring faster computers over parallel computing.

The most common misuse is based on the assumption that the amount of speedup is independent of the size of the problem. Amdahl's Law simply does not address how problems scale. The fraction of the code that must be executed serially usually changes as the size of the problem changes. So, it is a mistake to assume that a problem's speedup factor will be the same when the scale of the problem changes. For instance, if you double the length of a simulation, you may find that the serial portions of the simulation, such as the initialization and report phases, are basically unchanged, while the parallelizable portion of the code is what doubles. Hence, the fraction of the time spent in the serial code will decrease and Amdahl's Law will specify a greater speedup. This is good news! After all, it's when problems get bigger that we most need the speedup. For most problems, the speedup factor will depend upon the problem size. As the problem size changes, so does the speedup factor. The amount will depend on the nature of the individual problem, but typically, the speedup will increase as the size of the problem increases. As the problem size grows, it is not unusual to the see a linear increase in the amount of time spent in the serial portion of the code and a quadratic increase in the amount of time spent in the parallelizable portion of the code. Unfortunately, if you only apply Amdahl's Law to the smaller problem size, you'll underestimate the benefit of a parallel approach.

Having said this, it is important to remember that Amdahl's Law does clearly state a limitation of parallel computing. But this limitation varies not only from problem to problem, but with the size of the problem as well.

One last word about the limitations of clusters-the limitations are often tied to a particular approach. It is often possible to mix approaches and avoid limitations. For example, in constructing your clusters, you'll want to use the best computers you can afford. This will lessen the impact of inherently serial code. And don't forget to look at your algorithms!

    Previous Section Table of Contents Next Section