July 2005 - Continuous Build Methodology
By Andy Bruce
Email: andy@softwareab.net
http://www.softwareab.net/
From Floundering to Professional in 10 Easy Steps
A quick overview of how a disciplined and detail-oriented
/process /turned a floundering and badly written system into a commercial
reality.
In 2002 I became an independent consultant and started working with an
ex-manager of mine on some thoroughly prosaic VB6 software extending the M$ Great
Plains accounting system. When I came on board, the previous team
lead had discouraged even full compiles "since they just keep the
developers from moving forward on the deadlines." There was no process
defined, no technical design documents (other than my manager friend's work for
the then-single customer), and no thought given to integration, installation,
deployment, or maintenance. The system was hopelessly bottlenecked in a pure
reactive mode with no thought given past the current set of fires, and a very
unhappy client.
Principles regarding a nightly build and continuous integration are ones I have
absorbed organically in my career. It takes around 12 years, but after X number
of products and Y number of Friday 3am
bug fix marathons, the absurdity of the typical development process becomes
apparent. Even strong companies where I've worked like EMC in Hopkinton, MA or Landmark
Systems Corporation in Tyson's Corner, VA had an emphasis on keeping testing
separate from the coding.
In a word--the single most important aspect of successful development is: Have
Respect For The Process. In our case, following the
process has led to a reliable product with very fast turnaround of bug patches,
but easily maintained ongoing development. We have automated builds, regression
tests, integration between defect tracking and source control, quick
"state of the system" statuses, a well-defined set of developer and
customer documents, automated installations and program upgrades, and much
more.
The Process is what allows you not only to develop reliable and well-tested
code in the beginning, but also allows you to respond to the inevitable fires
that occur after code is released and new development is fighting for resources
with maintenance patches.
In my product's case, success came through following these points:
- Ensure that development can occur
anywhere, at any time. The existing software environment assumed that
everyone was on-site (a typical M$ SourceSafe drive-mapping-based system).
Although in my case I used CVS, the important point was that your version
control needs to free your developers from the tyranny of location. And by
leveraging SSH as a poor-man's VPN, I was able to free us from the
bottleneck of the corporate IT department (they trusted SSH well enough,
and simply opened up port 22 on the firewall). By doing this, we were able
to use all our development resources (source control, common programs,
database tools, etc.) remotely.
- Understand where you're coming from.
In my case, no system documentation existed all. My next step was to
ensure that the major processes (build, deploy, maintain) were written
down. Besides serving as a manual install guide, this laid the framework
for the automated processes to follow.
- Code standards matter. Another
tough one. Many developers have the mistaken opinion that how code is
formatted has little bearing on the end result. Others believe that
following a standard is obstreperous and a drag on creativity. Nothing is
farther from the truth. Just because one follows the strict form of a
sonnet or a contrapuntal fugue does not mean that the sonnet or fugue is
restricted (in the real sense of the word) at all. Instead, it just means
that others can recognize the form and have a much better chance of
understanding the progressions and applying changes. It is just so with
software development. By following well-defined and concise formatting
standards, one can reduce the chance of errors (as in, with the old C
compilers, using the form "if( 0 == x
)" rather than "if( x = 0)" with the inevitable error
applied in the second case). Moreover, it makes company code equally easy
to read regardless of the initial author. While to many this is a
difficult concept to grasp, the fact is that we all grow and move on. It's
also inevitable that in any successful code the ongoing maintenance team
is almost never the original development team (and, in many cases, simply
not as good technically as the original development team). One must give
the next person the best chance at understanding and working on code
modules, unless one's goal is stagnation and a reduced role as "the
permanent foobar maintainer".
- Compile for complaints. Turn up
your compiler warnings and errors to the highest possible level. If using
C++ or C, get a good lint and apply a strict
corporate-wide policy. Do not tolerate any messages from your compiler.
Enable any type of memory and/or logic checking tools that can be compiled
with your application (even the old M$ C++ compilers offer the ability to
do simply memory verification during program runs; more advanced tools
like BoundsChecker simply do a better job).
- Assume that they're all out to get
you. In your modules, assume that your input parameters are not only
wrong, but wrong in strange ways. When dealing with database parameters,
assume that every variable passed in is a mistaken NULL and that every
data lookup fails and that every memory operation is wrong and that every
OS call dies miserably. As to what happens when failures do occur: I'm of
the school that you capture the error, log it, and keep going. Others are
of the school that you fail quickly and fail hard. But in either
case, the key point here is that everything does go wrong at some time or
another. Your job is to be aware of that fact.
- Software deployment and upgrades make
or break the system. Even before anything else, one must consider the
frightening fact that software, if successful, will be installed. It will
be installed on many machines, and most users will not have admin privs. It's critical to take the time up-front to have
a plan such that one's product can not only install itself once, but keep
itself maintained automatically. In our case, the initial client install
simply puts a stub on the machine. During the initial login to the server
database is when all the important stuff happens. The client itself can be
upgraded, new libraries can be installed, client extensions can be loaded
dynamically, menu configuration options can be
set all based on configuration files located on the server. The key point
here is that we design the system to assume wildely
successful sales, and as we all know the only thing that can kill any
project faster than failure to sell anything is the ability to sell a lot
of things.
- Go out of your way to automate.
The key point here is: /Not all time is created equally./
A script that automates a key process otherwise requiring alert and full
attention is never a waste of time. So, take the time to get the product
build automated using the tool of your choice. However, even an automated
build is just the beginning. Example: When
building software there may be a set of instructions between the
"make build" and "make deploy" that need to be typed
manually. All together, these instructions take less than ten minutes to
run while automating the instructions may require several hours due to
complexity. However, when one is working under high pressure with multiple
deadlines all popping at once, one finds that /not having to think/ during
repetitive but crucial processes is a life-saver. This concept generally
raises programmers' hackles: after all, why can't you just be alert and
focused when executing a well-defined set of commands? The only answer to
that is hard experience and the sad realization
that mistakes do happen; and mistakes occuring
during the final "quick rebuild of the system for the last
patch" ultimately occur to all of us. Those are the painful lessons.
- Every line of code is a mistake.
This is another controversial statement, but it means one simple thing: In
many conditions, human beings simply cannot write very good computer code.
Under pressure, humans make mistakes; we get tired and cranky, and in
general we don't do repetitive and precise actions very well. The answer
to the statement is: Minimize your
mistakes. In other words, write less code. The best way to do this is
to identify where in one's process one has well-defined sets of software
modules that must be kept in synchronization with each other, and to
automate that generation. In our case, this was the database interface as
well as the database upgrade scripts. I ensured we had both commercial and
in-house custom tools to generate our database layers and our database
upgrade and installation scripts automatically. I have no qualms that the
code generation works equally well under all types of deadline pressure.
- Build your regression tests early and
make them complex. Regression tests are not a panacea by any means,
but they are highly effective in ensuring a minimum level of reliability.
But, developers really don't like to write them. And, some managers have
the odd idea that, just because something works when it was first written,
the same software should work months later. Sadly enough, that simply
isn't the case. As systems evolve the underlying modules change their
interfaces and assumptions in subtle ways. Regressions are simply the only
way to ensure that modules that used to work,
continue to work. And the more complex and detail-oriented the regressions
are, the better (developers and manage despise such tests because of the
initial flurry of false positives they create). In our project we have
around 500 regressions that run with each build, simply because I insisted
we build them. And while I do have trouble getting anyone to write any
more of these tests, I do feel a sense of relief and satisfaction each
time our existing test suite passes at 100%.
- The Development Install is the same as
the Customer Install. I know three different major products on which I
worked, all of which strictly segregated the developers’ configuration
setup from the customers. This inevitably led to much wasted time and lots
of finger pointing at the end. And in all three cases, there ended up
being a full team of folks working on the product installation suite. In
our product, I simply did not tolerate that approach. We have One Way to
setup our system. This means that our installer is tested every day on
numerous machines by our automated build process. When we get to the end
of the cycle, we don't have to worry about whether we've updated the
installer to handle new shared libraries or system registry entries; we've
caught that low-hanging fruit very early in the process.
There
are lots more points to consider… (automated
data loading, managing multiple ongoing development branches, code promotion
strategy ("Never Break The Build"), but this is enough for one
article!