17.2 Writing and Optimizing Code

Code optimization can be done by hand or by the compiler. While you should avoid writing obviously inefficient code, you shouldn't get carried away doing hand optimizations until you've let your compiler have a try at optimizing your code. You are usually much better off writing clean, clear, maintainable code than writing baroque code that saves a few cycles here or there. Most modern compilers, when used with the appropriate compiler options, are very good at optimizing code. It is often possible to have the best of both worlds-code that can be read by mere mortals but that compiles to a fully optimized executable.

With this in mind, take the time to learn what optimization options are available with your compiler. Because it takes longer to compile code when optimizing, because time-optimized code can be larger than unoptimized code, and because compiler optimizations may reorder instructions, making code more difficult to debug and profile, compilers typically will not optimize code unless specifically directed to do so.

With gcc, the optimization level is set with the -O compiler flag. (That's the letter O.) With the flag -O1, most basic optimizations are done. More optimizations are done when the -O2 flag is used and still more with the -O3 flag. (-O0 is used to suppress optimization and -Os is used to optimize for size.) In addition to these collective optimizations, gcc provides additional flags for other types of optimizations, such as loop unrolling, that might be useful in some situations. Consult your compiler's documentation for particulars.

If you have selected your algorithm carefully and your compiler has done all it can for you, the next step in optimizing code is to locate what portions of the code may benefit from further attention. But locating the hot spots in your code doesn't mean that you'll be able to eliminate them or lessen their impact. You may be working with an inherently time-consuming problem. On the other hand, if you don't look, you'll never know.

Larger problems that you may be able to identify and address include problems with memory access, I/O (I/O is always expensive), load balancing and task granularity, and communication patterns. Basically, anything that results in idle processors is worth examining.

Your extreme hotspots will be blocks of code that are executed repeatedly. These typically occur within loops or, especially, nested loops. For these, some hand optimization may be worthwhile. A number of techniques may be used, but they all boil down to eliminating unnecessary operations. Basically, you'll need to focus on and locate the instructions in question and look for ways to eliminate the number of instructions or replace them with less costly instructions. For example, moving instructions out of a loop will reduce the number of times the instructions are executed, while replacing an exponentiation with a multiplication can reduce the cost of an individual instruction.

A detailed description of the various techniques that can be used is outside the scope of this book. Several sources are listed in the Appendix A. The remainder of this chapter describes tools that will help you locate inefficient code.

Table of Contents