Book review: Code Complete
A review of Code Complete, by Steve McConnell, with extracts from the book
Code Complete, Steve McConnel (Microsoft Press, 1993)
A strong focus on a single topic
Code Complete is devoted to a single topic - writing code. This is more clearly defined than programming in general, because that is often taken to mean 'all of the different things that programmers do', including design, testing and integration.
Code Complete treats the subject intelligently, and is not distracted by side-issues, such as choice of programming language, or the latest buzz-word topics related to the subject. This is not a book about a particular programming language or technology; the examples in the book use whichever language best illustrates the point.
There is a difference between 'writing code' in general, and writing C++, say. For example, you wouldn't expect a book about C++ to deal with issues such as choice of variable names, because that isn't intrinsically a C++ issue. Of course, many computer books about a particular programming language do stray from the topic (just as I am doing now), and discuss programming in general.
A thorough treatment of writing code
Code Complete is a thick book because it gives a thorough treatment of a single subject, rather than just a shallow overview. This is an exception to the general rule that computer books that are more than 2 cm thick have little content WHY? .
See the extracts from some of the 33 chapters below, for some idea of the breadth and level of detail in the book.
Balanced presentation of multiple views and clear advice
Code Complete discusses alternative and conflicting viewpoints, set against both academic studies and the author's commercial experience. This lends a great deal authority to the content of the book, and is a combination rarely found in the IT section of your average bookshop.
The author also offers explicit and clearly-worded advice, which is refreshing and useful. As well as daring to be prescriptive, he is careful to indicate which issues are controversial, giving a balanced presentation of both sides of the argument. All of this makes the various suggestions much easier to understand and swallow.
Examples of advice from the book
The following extracts taken from ten of Code Complete's 33 chapters illustrate its style and the nature of the advice. A disproportionate number of the extracts chosen quote academic studies, as the clear way that these results are presented and tied in with the rest of the text is one of the book's major strengths. Page numbers are given for each quotation.
From Chapter 5 - Characteristics of high-quality routines:
- 'For a procedure name, use a strong verb followed by an object', e.g. PrintReport(), CheckBatteryStatus() (Except in object-oriented languages, where you don't need to include the name of the object.) (p80)
- 'For a function name, use a description of the return value [... e.g.] NextCustomerID(), PrinterReady()' (p80)
- 'Make routine names as long as necessary.' Good function and subroutine names tend to be longer than variable names; 15-20 characters is a realistic average length. (p81)
- 'Limit the number of a routine's parameters to about seven. [...]
Psychological research has shown that people generally cannot keep track of more than about seven chunks of information at once (Miller 1956).'
Miller, G. A. 1956. "The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information." The Psychological Review 63, no. 2 (March): 81-87. (p108) - '[When passing arguments to a routine, pass] only the parts of structured variables that the routine needs. [...] This is an aspect of information hiding: some information is hidden in routines; some is hidden from routines.' (p?)
From Chapter 6 - Three out of four programmers surveyed prefer modules:
- '[...] Information hiding is one of the few theoretical techniques that has indisputably proven its value in practice (Boehm 1987a).' (p118)
From Chapter 7 - High-level design in construction:
- On software design: 'Horst Rittel and Melvin Webber defined a "wicked" problem as one that could be clearly defined only by solving it, or by solving part of it (1973). [...] One of the main differences between programs that you develop in school and those you develop as a professional is that the design problems solved by school programs are rarely, if ever, wicked. [...] A key to effective design is that it's a heuristic process. Design always involves some trial and error.' (p160)
From Chapter 8 - Creating data:
- In data processing (i.e. programming) the data is more important than the processing (i.e. the program). So understand data structures.
- 'Initialize each variable close to where it's used. [...] throwing all the initializations together creates the impression that all the variables are used throughout the whole routine [...]. This is an example of the Principle of Proximity: Keep related actions together.' (p180)
From Chapter 9 - The power of data names:
- 'You can't give a variable a name the way you give a dog a name - because it's cute or it has a good sound. Unlike the dog and its name, which are different entities, a variable and a variable's name are essentially the same thing. [...] The most important consideration in naming a variable is that the name fully and accurately describe the entity the variable represents.' (p185)
- 'A good mnemonic [variable] name speaks to the problem rather than the solution. [...] In genreral, if a name refers to some aspect of computing rather than to the problem [such as InputRec instead of EmployeeData, or BitFlag instead of PrinterReady], it's a how rather than a what [and probably isn't the best name].' (p187)
- 'Gorla, Benander, and Benander found that the effort required to debug a COBOL program was minimized when variables had names that averaged 10 to 16 characters (1990). [this should only be taken to mean] that if you look over your code and see many names that are shorter, you should check to be sure that the names are as clear as they need to be.' (p188)
- 'When you find yourself "figuring out" a section of code, consider renaming the variables. It's OK to figure out murder mysteries, but you shouldn't need to figure out code. You should be able to read it.' (p193)
- 'Remember that names matter more to the reader of the code than to the writer', which means that being able to understand your own code is not a sufficient condition for it to be readable. (p209)
- 'Avoid numerals in names. If the numerals in a name are really significant, use an array instead of separate variables. If an array is inappropriate, numerals are even more inappropriate. [...] You can almost always think of a better way to differentiate between two variables by tacking a 1 or a 2 onto the end of the name. I can't say never use numerals, but you should be desperate before you do.' (p210)
From Chapter 11 - Fundamental data types:
- 'Think of arrays as sequential structures. Some of the brightest people in computer science have suggested that arrays never be accessed randomly, but only sequentially (Mills and Linger 1986). Their argument is that random accesses in arrays are similar to random "gotos" in a program.' (p251)
From Chapter 12 - Complex data types:
- 'Traditionally, programming books [and university courses] wax mathematical when they arrive at the topic of abstract data types [and ...] provide some boring examples of how to write access routines for a stack, a queue, or a list. [...] Such dry explanations of abstract data types completely miss the point [...] because you can use them to manipulate real-world entities rather than computer-science entities.' (p288)
From Chapter 13 - Organizing straight-line code:
- Minimise a variable's 'live time' - the number of statements between the first and last references to the variable in the source code. This makes the code easier to read and reduces the likelihood of bugs being introduced as the code is modified and straight line code has loops added, for example. (p305)
- Group related lines of code. 'They can be related because they operate on the same data, perform similar tasks, or depend on each other's being formed in order.' (p308)
From Chapter 14 - Using conditionals:
- IF statements: 'Write the nominal path through the code first; then write the exceptions. [...] Make sure you branch correctly on equality. [...] Put the nominal case after the IF rather than after the ELSE. [...] Follow the IF clause with a meaningful statement.' (p312)
- IF statements: 'If you think you need a plain IF statement, consider whether you don't actually need an IF-THEN-ELSE statement. In a General Motors analysis of code written in PL/1, only 17 percent of IF statements had an ELSE clause. Further analysis of the code showed that 50 to 80 percent should have had one (Elshoff 1976).' (p314)
- IF statements: 'Use boolean [sic] variables to simplify complicated tests. [... or] Simplify complicated tests with boolean [sic] function calls.' (p315)
- case statements: 'You might sometimes have only one case remaining and decide to code that case as the default clause. Though sometimes tempting, that's dumb. [...] Use the default clause to detect errors. (p319)
From Chapter 15 - Controlling loops:
- 'Make each loop perform only one function. [...] If it seems inefficient to use two loops where one would suffice, write the code as two loops, comment that they could be combined for efficiency, and then wait until benchmarks show that the section of the program poses a performance problem before changing the two loops into one.' (p334)
- 'Studies have shown that the ability of programmers to comprehend a loop deteriorates significantly beyond three levels of nesting (Yourdon 1986a). If you're going beyond that number of levels, make the loop shorter (conceptually) by breaking part of it into a routine or simplifying the control structure.' (p341)