Wednesday 4 February 2009

Theories in jUnit 4

Part of what's new in jUnit4 is theory functionality. The notion here is that you write your unit test as a series of methods, each of which states the assumptions under which it applies, and then asserts something about the outcome of the test. Substituting the words precondition and postcondition for assumption and assertion gives me a reassuring sense of deja vu.

This article is about my attempts to write a little theory, and how I got stuck. If anyone out there that can help me understand this better, I would be very grateful!

An example
SUN Spots use IEEE 802.15.4 radio communications. This standard uses 64-bit addresses. People usually like to see these addresses as four quartets of hex numbers (e.g. "0041.1DAF.078E.0042"), but the internals of the Java comms stack would rather see them as 64-bit numbers. And in a few places, they appear as strings containing the decimal value of the address.

In the code there's a class called IEEEAddress, which, startlingly enough, represents one of these addresses and takes care of the conversions. The code inside this class for determining the long from a String looks something like this
...
if (containsOnlyDecimalDigits(aString)) {
return Long.parseInt(entry, 10);
} else if (isFourQuartetsOfHexDigitsSeparatedByFullStops(aString)) {
return Long.parseInt(aString.replace(".", ""), 16);
} else {
throw new IllegalArgumentException(aString + " isn't a valid IEEE address");
}
...

There are a couple of helper methods here: containsOnlyDecimalDigits(aString) and isFourQuartetsOfHexDigitsSeparatedByFullStops(aString). These are moderately complex, but we shouldn't need to look at them to understand what follows.

The existing tests throws a load of test cases at the code, each of which looks somewhat like this:
...
IEEEAddress address = new IEEEAddress("0123.4567.89ab.cdef");
assertEquals("Long should be parsed correctly", 0x123456789ABCDEFL, address.asLong());
...

So far, so good. So next, I tried to write the tests as a theory instead. Here's my first try at that:
@RunWith(Theories.class)
public class IEEEAddressTheory {
@DataPoint public static String dp1 = "0123.4567.89ab.cdef";
@DataPoint public static String dp2 = "0123456789";
@DataPoint public static String dp3 = "0123.456789.23"; // invalid example to test exceptions

@Theory public void dottedStringsContainHexDigits(String aString) {
assumeTrue(isFourQuartetsOfHexDigitsSeparatedByFullStops(aString));
String withoutDots = aString.replace(".", "");
long expectedResult = Long.parseLong(withoutDots, 16);
assertEquals(expectedResult, new IEEEAddress(aString).asLong());
}

@Theory public void undottedStringsContainDecimalDigits(String aString) {
assumeTrue(containsOnlyDecimalDigits(aString));
long expectedResult = Long.parseLong(aString, 10);
assertEquals(expectedResult, new IEEEAddress(aString).asLong());
}

@Theory public void invalidStringsThrowIllegalArgumentException(String aString) {
assumeTrue(
!isFourQuartetsOfHexDigitsSeparatedByFullStops(aString) &&
!containsOnlyDecimalDigits(aString));
try {
new IEEEAddress(aString).asLong();
fail("Should get exception for bad syntax");
} catch (IllegalArgumentException e) {};
}
}

The idea of a theory is that each data point is thrown at each theory method in turn. Thus in my case, we'll get nine test executions. In cases where the result of "assumeTrue(...)" returns false, the test will be abandoned, but will be ignored (that's not a failure, the data point just doesn't apply to that clause of the theory). Where the assumption passes, the rest of the method executes like a normal jUnit test.

Now, I can write a few extra data points, and this thing will exercise the code as effectively as the existing unit test. It even works, runs all the cases and it's kind of neat. However, and here's my issue with all this, the theory basically replicates the code under test - and so it's hardly surprising that it passes!

My first thought was to write the code differently between the tests and the implementation. A bit of poking around reveals that I could probably download macrest-text-patterns, express the notion of all digits and/or the dotted hex notation as regular expressions, and use that. But then, if I thought that was the way to do the parsing, I could do much the same inside my live code too. And once again, what exactly am I testing? Or am I supposed to implement the code twice, once optimally, and once using a poorer algorithm that I invent for testing?

Then I read http://shareandenjoy.saff.net/tdd-specifications.pdf (an interesting paper about theories). This paper suggests that it might be better to write theories as identities based on some sort of round trip. Here's a fragment of an alternative theory based on that idea:
...
@Theory public void roundTripsWorkForDottedHex(String aDottedHexString) {
assumeTrue(isFourQuartetsOfHexDigitsSeparatedByFullStops(aDottedHexString));
long numberValue = new IEEEAddress(aDottedHexString).asLong();
assertEquals(aDottedHexString, new IEEEAddress(numberValue).asDottedHex());
}
...

This should be true for absolutely all cases where the assumption holds, and it doesn't know anything about how to do the conversions between strings and longs, which is a step forward. But...

1. It is vulnerable to a pair of matching errors in the code. For example, imagine a bug where IEEEAddress.asLong() yields a result with the wrong sign. If IEEEAddress.asDottedHex() also reversed the sign, the theory would pass. For this reason, such a theory would need to be coupled with a couple of the more traditional tests.

2. More significantly, though, the complexity in the class under test is inside the helper methods isFourQuartetsOfHexDigitsSeparatedByFullStops(aString) and containsOnlyDecimalDigits(aString), and the theory still shares those with the code under test.

I'm not sure what to do next with this. Maybe theories just aren't good with code where the values of parameters are highly constrained. Or maybe I'm missing some obvious next step. Any ideas?