July 2005 - Continuous Build Methodology

By Andy Bruce

http://www.softwareab.net/

From Floundering to Professional in 10 Easy Steps

A quick overview of how a disciplined and detail-oriented /process /turned a floundering and badly written system into a commercial reality.

In 2002 I became an independent consultant and started working with an ex-manager of mine on some thoroughly prosaic VB6 software extending the M$ Great Plains accounting system. When I came on board, the previous team lead had discouraged even full compiles "since they just keep the developers from moving forward on the deadlines." There was no process defined, no technical design documents (other than my manager friend's work for the then-single customer), and no thought given to integration, installation, deployment, or maintenance. The system was hopelessly bottlenecked in a pure reactive mode with no thought given past the current set of fires, and a very unhappy client.

Principles regarding a nightly build and continuous integration are ones I have absorbed organically in my career. It takes around 12 years, but after X number of products and Y number of Friday 3am bug fix marathons, the absurdity of the typical development process becomes apparent. Even strong companies where I've worked like EMC in Hopkinton, MA or Landmark Systems Corporation in Tyson's Corner, VA had an emphasis on keeping testing separate from the coding.

In a word--the single most important aspect of successful development is: Have Respect For The Process. In our case, following the process has led to a reliable product with very fast turnaround of bug patches, but easily maintained ongoing development. We have automated builds, regression tests, integration between defect tracking and source control, quick "state of the system" statuses, a well-defined set of developer and customer documents, automated installations and program upgrades, and much more.

The Process is what allows you not only to develop reliable and well-tested code in the beginning, but also allows you to respond to the inevitable fires that occur after code is released and new development is fighting for resources with maintenance patches.

In my product's case, success came through following these points:

Ensure that development can occur anywhere, at any time. The existing software environment assumed that everyone was on-site (a typical M$ SourceSafe drive-mapping-based system). Although in my case I used CVS, the important point was that your version control needs to free your developers from the tyranny of location. And by leveraging SSH as a poor-man's VPN, I was able to free us from the bottleneck of the corporate IT department (they trusted SSH well enough, and simply opened up port 22 on the firewall). By doing this, we were able to use all our development resources (source control, common programs, database tools, etc.) remotely.

Understand where you're coming from. In my case, no system documentation existed all. My next step was to ensure that the major processes (build, deploy, maintain) were written down. Besides serving as a manual install guide, this laid the framework for the automated processes to follow.

Code standards matter. Another tough one. Many developers have the mistaken opinion that how code is formatted has little bearing on the end result. Others believe that following a standard is obstreperous and a drag on creativity. Nothing is farther from the truth. Just because one follows the strict form of a sonnet or a contrapuntal fugue does not mean that the sonnet or fugue is restricted (in the real sense of the word) at all. Instead, it just means that others can recognize the form and have a much better chance of understanding the progressions and applying changes. It is just so with software development. By following well-defined and concise formatting standards, one can reduce the chance of errors (as in, with the old C compilers, using the form "if( 0 == x )" rather than "if( x = 0)" with the inevitable error applied in the second case). Moreover, it makes company code equally easy to read regardless of the initial author. While to many this is a difficult concept to grasp, the fact is that we all grow and move on. It's also inevitable that in any successful code the ongoing maintenance team is almost never the original development team (and, in many cases, simply not as good technically as the original development team). One must give the next person the best chance at understanding and working on code modules, unless one's goal is stagnation and a reduced role as "the permanent foobar maintainer".

Compile for complaints. Turn up your compiler warnings and errors to the highest possible level. If using C++ or C, get a good lint and apply a strict corporate-wide policy. Do not tolerate any messages from your compiler. Enable any type of memory and/or logic checking tools that can be compiled with your application (even the old M$ C++ compilers offer the ability to do simply memory verification during program runs; more advanced tools like BoundsChecker simply do a better job).

Assume that they're all out to get you. In your modules, assume that your input parameters are not only wrong, but wrong in strange ways. When dealing with database parameters, assume that every variable passed in is a mistaken NULL and that every data lookup fails and that every memory operation is wrong and that every OS call dies miserably. As to what happens when failures do occur: I'm of the school that you capture the error, log it, and keep going. Others are of the school that you fail quickly and fail hard. But in either case, the key point here is that everything does go wrong at some time or another. Your job is to be aware of that fact.

Software deployment and upgrades make or break the system. Even before anything else, one must consider the frightening fact that software, if successful, will be installed. It will be installed on many machines, and most users will not have admin privs. It's critical to take the time up-front to have a plan such that one's product can not only install itself once, but keep itself maintained automatically. In our case, the initial client install simply puts a stub on the machine. During the initial login to the server database is when all the important stuff happens. The client itself can be upgraded, new libraries can be installed, client extensions can be loaded dynamically, menu configuration options can be set all based on configuration files located on the server. The key point here is that we design the system to assume wildely successful sales, and as we all know the only thing that can kill any project faster than failure to sell anything is the ability to sell a lot of things.

Go out of your way to automate. The key point here is: /Not all time is created equally./ A script that automates a key process otherwise requiring alert and full attention is never a waste of time. So, take the time to get the product build automated using the tool of your choice. However, even an automated build is just the beginning. Example: When building software there may be a set of instructions between the "make build" and "make deploy" that need to be typed manually. All together, these instructions take less than ten minutes to run while automating the instructions may require several hours due to complexity. However, when one is working under high pressure with multiple deadlines all popping at once, one finds that /not having to think/ during repetitive but crucial processes is a life-saver. This concept generally raises programmers' hackles: after all, why can't you just be alert and focused when executing a well-defined set of commands? The only answer to that is hard experience and the sad realization that mistakes do happen; and mistakes occuring during the final "quick rebuild of the system for the last patch" ultimately occur to all of us. Those are the painful lessons.

Every line of code is a mistake. This is another controversial statement, but it means one simple thing: In many conditions, human beings simply cannot write very good computer code. Under pressure, humans make mistakes; we get tired and cranky, and in general we don't do repetitive and precise actions very well. The answer to the statement is: Minimize your mistakes. In other words, write less code. The best way to do this is to identify where in one's process one has well-defined sets of software modules that must be kept in synchronization with each other, and to automate that generation. In our case, this was the database interface as well as the database upgrade scripts. I ensured we had both commercial and in-house custom tools to generate our database layers and our database upgrade and installation scripts automatically. I have no qualms that the code generation works equally well under all types of deadline pressure.

Build your regression tests early and make them complex. Regression tests are not a panacea by any means, but they are highly effective in ensuring a minimum level of reliability. But, developers really don't like to write them. And, some managers have the odd idea that, just because something works when it was first written, the same software should work months later. Sadly enough, that simply isn't the case. As systems evolve the underlying modules change their interfaces and assumptions in subtle ways. Regressions are simply the only way to ensure that modules that used to work, continue to work. And the more complex and detail-oriented the regressions are, the better (developers and manage despise such tests because of the initial flurry of false positives they create). In our project we have around 500 regressions that run with each build, simply because I insisted we build them. And while I do have trouble getting anyone to write any more of these tests, I do feel a sense of relief and satisfaction each time our existing test suite passes at 100%.

The Development Install is the same as the Customer Install. I know three different major products on which I worked, all of which strictly segregated the developers’ configuration setup from the customers. This inevitably led to much wasted time and lots of finger pointing at the end. And in all three cases, there ended up being a full team of folks working on the product installation suite. In our product, I simply did not tolerate that approach. We have One Way to setup our system. This means that our installer is tested every day on numerous machines by our automated build process. When we get to the end of the cycle, we don't have to worry about whether we've updated the installer to handle new shared libraries or system registry entries; we've caught that low-hanging fruit very early in the process.

There are lots more points to consider… (automated data loading, managing multiple ongoing development branches, code promotion strategy ("Never Break The Build"), but this is enough for one article!