Distributed Memory: The hidden delights of Unit testing

Friday, May 18, 2007

The hidden delights of Unit testing

One of the things that has left this blog light of content of late is having been provided with internal blogging, wherein I try to enlighten my colleagues. Often the entries on that blog are collations of links to and through other blogs I read. Some are from experience. Like this one.

One part of the current development project has brought home to me quite how much how serious unit testing results in cleaner code -- and that the closer you strive for 100% coverage in the testing, the more incentive there is to write that clean code.

The closer to 100% you strive to get, especially with a coverage tool (such as gcov) that does branch coverage rather than just line coverage, the more it squeezes your code. At the most brutal, the more code you have, that means the more tests there are to write to cover it all -- the incentive is there to make the code tighter, just to reduce the amount of work to do for completion.

Much of the 'C' code being written contains routines that are explicitly each a little state machine. As such the structure of a routine is along the lines of

check preconditions
determine "state"
"switch" on the current state
tidy
return outcome

Some of the precondition checks are assert() but others cannot be (so will include an exit-on-failure); and the switch may not be a simple flat one -- some cases may have sub-cases; and some might overlap, in a structure like

where a and b are independent.

However, even if you're not explicitly thinking of the code as a state machine as such, the routine structure is still quite generic.

Coverage testing

A set of unit tests can make sure that expected inputs map to expected outputs, both positive and negative; coverage helps tell you if you have "enough" tests. It answers the questions "Has all that code been exercised, yet?" (if not the related "if it doesn't get used, why did you write it in the first place?").

The first thing that a set of obvious positive tests will show are the bits that are difficult to reach. The obvious one is handling exception states -- and here automation and good mocks in the test framework, or your own wrapper to it, are essential. After all, exceptions are meant to be, well, exceptional, but here need to be generated on demand.

With those out of the way, the real difficult-to-reach corner-cases of the logic stand out -- and with only those to concentrate on, you're either faced with writing a lot of tests to reach them; or figuring a way to simplify the code so you don't have to.

It is often tempting to write code like this:

but, damn it, if arranging the case a and c is hard work, you don't want to go through the slog with b as well. Factoring out the special case goes from being something you could do, if you had the enthusiasm, to something you want to do, because it's less work than writing the extra tests. "Don't Repeat Yourself" becomes positively encouraged.

Coverage types

The code metric you use is important in how much benefit you can derive. For a first pass, NCover isn't too bad. But it only counts line visits, being as it is an instance of the profiling API for .Net. In particular if you have code like--

NCover will never show you that you're missing the case of zero and negative values of a. One of the up-sides of working in 'C' on a *nix platform is that that has meant that gcov is available. And that will take code like--

and distinguish between whether a or b triggered the do something -- 100% in NCover usually isn't more than 90-odd% in gcov

Squeezing out the logic

Here's a real example of code improvement in making the last step to 100% branch coverage

I had 100% in NCover; but gcov reminds me that I don't cover all the bases -- because have_token and need_token aren't independent variables : if you don't need the token, you should never have one. So, what to do when aiming for the 100% mark?

The routine here started in a state where I just enumerated all possible cases (there are more than just these), handling them individually in some sort of logical order. Now, the unit tests I already have provide me a framework to check that the code is still doing what I mean it to do when I refactor; so I can look at the code and see that what I have is actually of the form

or, more simply

refactor, re-run the tests and see that the simpler code is still right.

Similarly code guarded by an if clause, where the else is never executed under any input you can generate, perhaps because the assert() defined contract of the method or its callers enforces the constraint, can be simplified to and assert() of the condition and an unconditional block. And you get that better code because you've made yourself go the last little bit.

In the case above, user input could have, but not need, the cookie; we can't assert -- but we don't need to write (though we can) another test case to prove that is harmless, because that is just another flavour of the "else".