17.1 Why Profile?

You have probably heard it before-the typical program will spend over 90% of its execution time in less that 10% of the actual code. This is just a rule of thumb or heuristic, and as such, will be wildly inaccurate or totally irrelevant for some programs. But for many, if not most, programs, it is a reasonable observation. The actual numbers don't matter since they will change from program to program. It is the idea that is important-for most programs, most of the execution time spent is in a very small portion of the code.

This is extremely important to keep in mind in this critical portion of code. If your application spends 95% of its time in 5% of the code, there is little to be gained by optimizing the other 95% of the code. Even if you could completely eliminate it, you'd only see a 5% improvement. But if you can manage a 10% improvement in the critical 5% of your code, for example, you'll see a 9.5% overall improvement in your program. Thus, the key to improving your code's performance is to identify that crucial 5%. That's where you should spend your time optimizing code.^[1]

^[1] In this chapter optimization means optimizing the time a program spends executing. Space optimizations will be ignored.

Keep in mind that there is a point of diminishing returns when optimizing code. You'll need to balance the amount of time you spend optimizing code with the amount of improvement you actually get. There is a point where your code is good enough. The goals of profiling are two-fold-to decide how much optimization is worth doing and to identify which parts of code should be optimized.

The first step to optimizing code begins before you start writing it. To write the most efficient code, you should begin by selecting the most appropriate or efficient algorithm. As the program size grows, an unoptimized O(n log₂ n) algorithm will often outperform an optimized O(n2) algorithm. Of course, algorithm selection will depend on your specific application. Unfortunately, it can be problematic for parallel applications.

For serial algorithms, you can often make reasonable estimates on how time is being spent by simply examining and analyzing the algorithm. The standard approach characterizes performance using some measurement of the problem size. For example, when sorting an array of numbers, the problem size would be the number of elements in the array. Some problems are easily characterized by a single number while others may be more difficult to characterize or may depend on several parameters. Since the problem size often provides a bound for algorithmic performance, this approach is sometimes called asymptotic analysis .

Asymptotic analysis can be problematic with parallel programs for several reasons. First, it may be difficult to estimate the cost of communications required by a parallel solution. This can be further complicated by the need for additional code to coordinate communications among the processors. Second, there is often a less than perfect overlap among the communicating processes. A processor may be idle while it waits for its next task (as with our numerical integration programs in earlier chapters). In particular, it may be difficult to predict when a processor will be idle and what effect this will have on overall performance. For these and other reasons, an empirical approach to estimating performance is often the preferred approach for parallel programs. That is, we directly measure performance of existing programs.

Thus, with parallel programs, the most appropriate strategy is to select the best algorithm you can and then empirically verify its actual performance.

Table of Contents