Clearly I'm not inspired to write about software at the moment, so instead, I think I'll use this blog to supplement my increasingly creaky memory and record my thoughts on things I see.
I just got back from holiday in South Africa (fantastic, thanks for asking) but while there I saw Toumani Diabate and Wanlov the Kubulor perform as part of a Cape Town "musical intervention" called, rather splendidly, the Pan-African Space Station.
Toumani D. is a relatively familiar name, but first, the support... Wanlov come from Ghana and comprise vocals, balafon, shekere, djembe, one string bass, some sort of lute, trumpet, box drum and various other bits of percussion. All this plus quirky lyrics ranging from jokey love song through full on political stuff, and a rather magnetic lead singer. I really enjoyed this gig, lots of fun. The CD isn't quite as good as the live band, it has a rather westernised or manufactured feel compared to the loose and swinging live band. 8/10 for the gig.
Toumani Diabate plays solo on the Kora (a 21 string African harp). He gave us a quick explanation of the instrument: you have two thumbs to play the bass, one index finger to play the melody, and the other index finger to "improvise". He's a spell-binding master. Most of his songs last ten minutes and it often feels like that isn't long enough. It puts you in mind of someone like Sonny Rollins playing solos: he establishes a theme and then plays endless variations on it, drifting away before returning, but never losing the structure. 10/10: I'd watch him f ive times a week if he played in my street.
A word about the venue: the Slave Church Museum in Cape Town was at full capacity of a few hundred, the acoustics were excellent, and this may have swelled the scores I've given both acts.
Tuesday, 6 October 2009
Thursday, 16 April 2009
Theories in jUnit revisited
A minor update on this stuff (see earlier post).
I've just got back from SPA2009, where there was a lot of excitement about Haskell. This seems interesting to me too, as I'm old enough to have Lisp and Prolog on the CV. One of the things I got to play with in one session was "quickcheck", a testing tool for Haskell.
Because Haskell is purely functional (for the most part), it lends itself to unit testing. Just call a function with some arguments, make some assertions about the result, no side-effects are even possible so you're done. Quickcheck allows you to constrain the process for generating sets of arguments and then you just let the system rip. So you can have, for example, one test that generates valid sets of arguments that checks for a correct result, and another that generates invalid sets that checks for appropriate errors. Much like theories in jUnit, but with the addition of automatic generation of sets of test parameters.
However, it seems clear that the Haskellites amongst us use this tool in an exploratory fashion. Hunting down that tricky to find bug that pops up now and then on live code? Not quite sure what the library in front of you does? Want to stress-test a piece of code that's going to have to cope with noisy data? For actual unit (regression) testing of real code, it's much less useful, not least because there's some randomness in the arguments which makes regression problematic.
So there it is: quickcheck for poking around, and HUnit (it had to exist) for your unit tests.
I've just got back from SPA2009, where there was a lot of excitement about Haskell. This seems interesting to me too, as I'm old enough to have Lisp and Prolog on the CV. One of the things I got to play with in one session was "quickcheck", a testing tool for Haskell.
Because Haskell is purely functional (for the most part), it lends itself to unit testing. Just call a function with some arguments, make some assertions about the result, no side-effects are even possible so you're done. Quickcheck allows you to constrain the process for generating sets of arguments and then you just let the system rip. So you can have, for example, one test that generates valid sets of arguments that checks for a correct result, and another that generates invalid sets that checks for appropriate errors. Much like theories in jUnit, but with the addition of automatic generation of sets of test parameters.
However, it seems clear that the Haskellites amongst us use this tool in an exploratory fashion. Hunting down that tricky to find bug that pops up now and then on live code? Not quite sure what the library in front of you does? Want to stress-test a piece of code that's going to have to cope with noisy data? For actual unit (regression) testing of real code, it's much less useful, not least because there's some randomness in the arguments which makes regression problematic.
So there it is: quickcheck for poking around, and HUnit (it had to exist) for your unit tests.
Wednesday, 4 February 2009
Theories in jUnit 4
Part of what's new in jUnit4 is theory functionality. The notion here is that you write your unit test as a series of methods, each of which states the assumptions under which it applies, and then asserts something about the outcome of the test. Substituting the words precondition and postcondition for assumption and assertion gives me a reassuring sense of deja vu.
This article is about my attempts to write a little theory, and how I got stuck. If anyone out there that can help me understand this better, I would be very grateful!
An example
SUN Spots use IEEE 802.15.4 radio communications. This standard uses 64-bit addresses. People usually like to see these addresses as four quartets of hex numbers (e.g. "0041.1DAF.078E.0042"), but the internals of the Java comms stack would rather see them as 64-bit numbers. And in a few places, they appear as strings containing the decimal value of the address.
In the code there's a class called IEEEAddress, which, startlingly enough, represents one of these addresses and takes care of the conversions. The code inside this class for determining the long from a String looks something like this
There are a couple of helper methods here: containsOnlyDecimalDigits(aString) and isFourQuartetsOfHexDigitsSeparatedByFullStops(aString). These are moderately complex, but we shouldn't need to look at them to understand what follows.
The existing tests throws a load of test cases at the code, each of which looks somewhat like this:
So far, so good. So next, I tried to write the tests as a theory instead. Here's my first try at that:
The idea of a theory is that each data point is thrown at each theory method in turn. Thus in my case, we'll get nine test executions. In cases where the result of "assumeTrue(...)" returns false, the test will be abandoned, but will be ignored (that's not a failure, the data point just doesn't apply to that clause of the theory). Where the assumption passes, the rest of the method executes like a normal jUnit test.
Now, I can write a few extra data points, and this thing will exercise the code as effectively as the existing unit test. It even works, runs all the cases and it's kind of neat. However, and here's my issue with all this, the theory basically replicates the code under test - and so it's hardly surprising that it passes!
My first thought was to write the code differently between the tests and the implementation. A bit of poking around reveals that I could probably download macrest-text-patterns, express the notion of all digits and/or the dotted hex notation as regular expressions, and use that. But then, if I thought that was the way to do the parsing, I could do much the same inside my live code too. And once again, what exactly am I testing? Or am I supposed to implement the code twice, once optimally, and once using a poorer algorithm that I invent for testing?
Then I read http://shareandenjoy.saff.net/tdd-specifications.pdf (an interesting paper about theories). This paper suggests that it might be better to write theories as identities based on some sort of round trip. Here's a fragment of an alternative theory based on that idea:
This should be true for absolutely all cases where the assumption holds, and it doesn't know anything about how to do the conversions between strings and longs, which is a step forward. But...
1. It is vulnerable to a pair of matching errors in the code. For example, imagine a bug where IEEEAddress.asLong() yields a result with the wrong sign. If IEEEAddress.asDottedHex() also reversed the sign, the theory would pass. For this reason, such a theory would need to be coupled with a couple of the more traditional tests.
2. More significantly, though, the complexity in the class under test is inside the helper methods isFourQuartetsOfHexDigitsSeparatedByFullStops(aString) and containsOnlyDecimalDigits(aString), and the theory still shares those with the code under test.
I'm not sure what to do next with this. Maybe theories just aren't good with code where the values of parameters are highly constrained. Or maybe I'm missing some obvious next step. Any ideas?
This article is about my attempts to write a little theory, and how I got stuck. If anyone out there that can help me understand this better, I would be very grateful!
An example
SUN Spots use IEEE 802.15.4 radio communications. This standard uses 64-bit addresses. People usually like to see these addresses as four quartets of hex numbers (e.g. "0041.1DAF.078E.0042"), but the internals of the Java comms stack would rather see them as 64-bit numbers. And in a few places, they appear as strings containing the decimal value of the address.
In the code there's a class called IEEEAddress, which, startlingly enough, represents one of these addresses and takes care of the conversions. The code inside this class for determining the long from a String looks something like this
...
if (containsOnlyDecimalDigits(aString)) {
return Long.parseInt(entry, 10);
} else if (isFourQuartetsOfHexDigitsSeparatedByFullStops(aString)) {
return Long.parseInt(aString.replace(".", ""), 16);
} else {
throw new IllegalArgumentException(aString + " isn't a valid IEEE address");
}
...
There are a couple of helper methods here: containsOnlyDecimalDigits(aString) and isFourQuartetsOfHexDigitsSeparatedByFullStops(aString). These are moderately complex, but we shouldn't need to look at them to understand what follows.
The existing tests throws a load of test cases at the code, each of which looks somewhat like this:
...
IEEEAddress address = new IEEEAddress("0123.4567.89ab.cdef");
assertEquals("Long should be parsed correctly", 0x123456789ABCDEFL, address.asLong());
...
So far, so good. So next, I tried to write the tests as a theory instead. Here's my first try at that:
@RunWith(Theories.class)
public class IEEEAddressTheory {
@DataPoint public static String dp1 = "0123.4567.89ab.cdef";
@DataPoint public static String dp2 = "0123456789";
@DataPoint public static String dp3 = "0123.456789.23"; // invalid example to test exceptions
@Theory public void dottedStringsContainHexDigits(String aString) {
assumeTrue(isFourQuartetsOfHexDigitsSeparatedByFullStops(aString));
String withoutDots = aString.replace(".", "");
long expectedResult = Long.parseLong(withoutDots, 16);
assertEquals(expectedResult, new IEEEAddress(aString).asLong());
}
@Theory public void undottedStringsContainDecimalDigits(String aString) {
assumeTrue(containsOnlyDecimalDigits(aString));
long expectedResult = Long.parseLong(aString, 10);
assertEquals(expectedResult, new IEEEAddress(aString).asLong());
}
@Theory public void invalidStringsThrowIllegalArgumentException(String aString) {
assumeTrue(
!isFourQuartetsOfHexDigitsSeparatedByFullStops(aString) &&
!containsOnlyDecimalDigits(aString));
try {
new IEEEAddress(aString).asLong();
fail("Should get exception for bad syntax");
} catch (IllegalArgumentException e) {};
}
}
The idea of a theory is that each data point is thrown at each theory method in turn. Thus in my case, we'll get nine test executions. In cases where the result of "assumeTrue(...)" returns false, the test will be abandoned, but will be ignored (that's not a failure, the data point just doesn't apply to that clause of the theory). Where the assumption passes, the rest of the method executes like a normal jUnit test.
Now, I can write a few extra data points, and this thing will exercise the code as effectively as the existing unit test. It even works, runs all the cases and it's kind of neat. However, and here's my issue with all this, the theory basically replicates the code under test - and so it's hardly surprising that it passes!
My first thought was to write the code differently between the tests and the implementation. A bit of poking around reveals that I could probably download macrest-text-patterns, express the notion of all digits and/or the dotted hex notation as regular expressions, and use that. But then, if I thought that was the way to do the parsing, I could do much the same inside my live code too. And once again, what exactly am I testing? Or am I supposed to implement the code twice, once optimally, and once using a poorer algorithm that I invent for testing?
Then I read http://shareandenjoy.saff.net/tdd-specifications.pdf (an interesting paper about theories). This paper suggests that it might be better to write theories as identities based on some sort of round trip. Here's a fragment of an alternative theory based on that idea:
...
@Theory public void roundTripsWorkForDottedHex(String aDottedHexString) {
assumeTrue(isFourQuartetsOfHexDigitsSeparatedByFullStops(aDottedHexString));
long numberValue = new IEEEAddress(aDottedHexString).asLong();
assertEquals(aDottedHexString, new IEEEAddress(numberValue).asDottedHex());
}
...
This should be true for absolutely all cases where the assumption holds, and it doesn't know anything about how to do the conversions between strings and longs, which is a step forward. But...
1. It is vulnerable to a pair of matching errors in the code. For example, imagine a bug where IEEEAddress.asLong() yields a result with the wrong sign. If IEEEAddress.asDottedHex() also reversed the sign, the theory would pass. For this reason, such a theory would need to be coupled with a couple of the more traditional tests.
2. More significantly, though, the complexity in the class under test is inside the helper methods isFourQuartetsOfHexDigitsSeparatedByFullStops(aString) and containsOnlyDecimalDigits(aString), and the theory still shares those with the code under test.
I'm not sure what to do next with this. Maybe theories just aren't good with code where the values of parameters are highly constrained. Or maybe I'm missing some obvious next step. Any ideas?
Sunday, 4 January 2009
Ten thousand hours...
A friend of mine is a sports physiotherapist. He's pretty good at this and has worked with premiership footballers, Belgian international Squash players and others. Apparently it's common knowledge in his world that all top sportspeople put in around 10,000 hours practice before they reach the top. That's about 20 hours a week for 10 years. He emphasises that this was practising as distinct from playing the sport. One key difference is the continuous involvement of a coach to make you do it right.
What does this mean for programmers? I think that pair programmers have a coach present, and so they're building their 10,000 hours. On this basis, 10,000 hours of pair programming makes you an international. No amount of lone programming gets you there though - you're just programming.
What does this mean for programmers? I think that pair programmers have a coach present, and so they're building their 10,000 hours. On this basis, 10,000 hours of pair programming makes you an international. No amount of lone programming gets you there though - you're just programming.
There's no comparison
XP Day 2008. XP Day isn't really about XP any more - it's about whatever the people that go every year are currently obsessed with. On this occasion, that seemed to be two topics. One was that old favourite, how shall we know good code when we see it? Not much to say about that except that I was pleased to find lots of other people in the middle of Bob Martin's "Clean Code", which is focused thereabouts.
The other obsession was lean thinking. This is one of those myriad attempts to improve software practice by copying something that worked somewhere else. In this case, Toyota came up with a bunch of practices that helped them to make cars more cheaply in the immediate post-war era. I mentioned this over the weekend to a friend whose job consists of trying to make various car factories run more efficiently. As you might expect, he knows a lot about value streams, kanbans, and the rest of this stuff, and he immediately came back with "but what has all that got to do with software?".
I was reminded of an OOPSLA in the late nineties. At that time, pattern languages were all the vogue, and Chris Alexander - the architect who invented pattern languages to capture architectural knowledge - had been invited to give the keynote. He shambled on stage, and expressed himself vaguely bemused to be there. Why would you all be interested in something designed for use with architecture? he asked us.
It seems to me that we do this a lot in software. Apart from lean thinking and architecture, other past efforts have included the SEI models (one of many other attempts to compare software with manufacturing), making movies, mountaineering, oil exploration, collaborative game playing, building buildings, and plenty more besides.
These can be sources of entertainment. I've spent some happy hours annoying movie-making friends by trying to make comparisons. But mostly, I don't think these analogies help much with the goal of getting better at writing software.
Years ago I saw Bertrand Meyer give a talk in which he said that the significance of a piece of advice was inversely linked to the plausibility of the opposite. If someone advises you to choose readable variable names, that's not advice: you were unlikely to make an effort to choose unreadable names. On the other hand, if you're advised to "choose names that express what is to be done rather than how" then you have real advice, because it's not immediately obvious that "how" is the wrong answer.
Much of the advice that comes from these analogies seems to me to fail the Meyer test. I was initially taken by someone talking at XP Day about applying the principles of lean applied to software. One outcome was the notion of not having any more than a small number of tasks on the go at once. But then I took a step back to consider the alternative. Deliberately strive to do lots of things at once? Was I about to do that? About the best you can say is that this changes the emphasis, but then you have to wonder, away from what?
Other debates about the management process had this same unsatisfactory feel. Reminders of lots of common sense stuff, but nothing that might change your practice. Breakthroughs in software development - the move from waterfall to iterative, for example - I don't think come from analogies. All the "good stuff" for me at XP Day happened where the topic under discussion was software development, in and of itself. I sat for an hour in an open space discussion about what makes software programs easy to read. I came away with some genuinely useful insights that will make me do something different in future. From here on in, I think I'm going to try and get better at software by doing and talking about software.
The other obsession was lean thinking. This is one of those myriad attempts to improve software practice by copying something that worked somewhere else. In this case, Toyota came up with a bunch of practices that helped them to make cars more cheaply in the immediate post-war era. I mentioned this over the weekend to a friend whose job consists of trying to make various car factories run more efficiently. As you might expect, he knows a lot about value streams, kanbans, and the rest of this stuff, and he immediately came back with "but what has all that got to do with software?".
I was reminded of an OOPSLA in the late nineties. At that time, pattern languages were all the vogue, and Chris Alexander - the architect who invented pattern languages to capture architectural knowledge - had been invited to give the keynote. He shambled on stage, and expressed himself vaguely bemused to be there. Why would you all be interested in something designed for use with architecture? he asked us.
It seems to me that we do this a lot in software. Apart from lean thinking and architecture, other past efforts have included the SEI models (one of many other attempts to compare software with manufacturing), making movies, mountaineering, oil exploration, collaborative game playing, building buildings, and plenty more besides.
These can be sources of entertainment. I've spent some happy hours annoying movie-making friends by trying to make comparisons. But mostly, I don't think these analogies help much with the goal of getting better at writing software.
Years ago I saw Bertrand Meyer give a talk in which he said that the significance of a piece of advice was inversely linked to the plausibility of the opposite. If someone advises you to choose readable variable names, that's not advice: you were unlikely to make an effort to choose unreadable names. On the other hand, if you're advised to "choose names that express what is to be done rather than how" then you have real advice, because it's not immediately obvious that "how" is the wrong answer.
Much of the advice that comes from these analogies seems to me to fail the Meyer test. I was initially taken by someone talking at XP Day about applying the principles of lean applied to software. One outcome was the notion of not having any more than a small number of tasks on the go at once. But then I took a step back to consider the alternative. Deliberately strive to do lots of things at once? Was I about to do that? About the best you can say is that this changes the emphasis, but then you have to wonder, away from what?
Other debates about the management process had this same unsatisfactory feel. Reminders of lots of common sense stuff, but nothing that might change your practice. Breakthroughs in software development - the move from waterfall to iterative, for example - I don't think come from analogies. All the "good stuff" for me at XP Day happened where the topic under discussion was software development, in and of itself. I sat for an hour in an open space discussion about what makes software programs easy to read. I came away with some genuinely useful insights that will make me do something different in future. From here on in, I think I'm going to try and get better at software by doing and talking about software.
Tuesday, 9 December 2008
Clean code
I've recently started reading a book called Clean Code by Bob Martin. I like this book's attitude: right up front you're left in no doubt that this is a book written by and for programmers. You're going to have to study real code examples hard to get the most out of it. So far, so good.
However, on page 29, we get the first example with more than ten lines. The book starts by presenting this listing:
The book suggests that this listing's local variables have unclear context at first: you have to read through to the end of the listing to work out what they are for. The book then suggests replacing the listing with this.
----------------------
public class GuessStatisticsMessage {
private String number;
private String verb;
private pluralModifier;
public String make(char candidate, int count) {
createPluralDependentMessageParts(count);
return String.format("There %s %s %s%s", verb, number, candidate, pluralModifier);
}
private void createPluralDependentMessageParts(int count) {
if (count == 0) {
thereAreNoLetters();
} else if (count == 1) {
thereIsOneLetter();
} else {
thereAreManyLetters(count);
}
}
private void thereAreManyLetters(int count) {
number = Integer.toString(count);
verb = "are";
pluralModifier = "s";
}
private void thereIsOneLetter() {
number = "1";
verb = "is";
pluralModifier = "";
}
private void thereAreNoLetters() {
number = "no";
verb = "are";
pluralModifier = "s";
}
This might be an improvement in quickly grasping the meaning of the variable names, but it's still fantastically complex. Adding a class always has a cost, because there's the overhead for understanding of wondering whether this class has other uses what it's scope is, how long it's supposed to live, and so on. But, more important, what about this solution:
String getGuessStatistics(char candidate, int count) {
switch (count) {
case : 0
return String.format("There are no %ss", candidate);
case : 1
return String.format("There is 1 %s", candidate);
default:
return String.format("There are %s %ss", Integer.toString(count), candidate);
}
}
To me, this is a better solution. This is to do with a property of code that I'm going to call "glanceability". The shorter a listing, then all else being easier, the quicker it is to understand. However, it's not just that my version is shorter: for me at least, once a piece of code gets down to a certain size and complexity, its overall thrust is capable of being understood at a glance. The last listing achieves this: one glance and I can see that it returns a new String which formats the arguments somehow. In part this is about size, but it's more about simple structure, and in particular having just one control structure: if there were six cases this code would still be glanceable. To get to that "glanceability" I'm prepared to break lots of (perhaps all) other rules. In this case, I have multiple return points, multiple calls to the same static function, and multiple places where I record that all my result strings start "There ..." string. All of which seems insignificant to me because of the glanceability.
The two listings from the book fail the glanceability test. Once they fail that, for me at least, then size and complexity aren't everything for me. If it's going to take a few seconds to understand a listing anyway, then being normalised and split into lots of small pieces might outweigh the gains from being simply shorter. I think I do prefer Bob Martin's second listing over his first, even though it's longer, because neither is glanceable, and the second doesn't have so much code that has to grasped in one go.
The second point I see here is about refactoring. The first listing from the book assumed that the best way to generate the statistics message was to generate a set of component parts according to the parameters, and then stitch those together in the same way in every case. The second listing from the book tried to fix the problems of the first while retaining the same algorithm. To get to my favourite listing, I had to stop refactoring and ditch the algorithm. It often seems to me when refactoring that I'm not radical enough. It's something to do with the refactoring tools found in IDEs, which mostly support code transformations that preserve the existing algorithm. This makes it much easier to improve the implementation of the existing algorithm than to replace it with a better algorithm: the latter requires pushing the keyboard away, sitting back and thinking hard for a while. Quite apart from being hard work, this isn't always the most sociable thing to do in a pair programming situation.
Finally, I didn't write this all in one go. In the meantime, I read some more of the book, and guess what? Just six pages further on, it tells me that "...functions should not be large enough to hold nested structures...". I think this might be a rule that'll guarantee glanceable functions.
And finally, finally, don't let this put you off the book. Everything else I've read so far strikes me as sensible and useful: and I'm confident I'll learn plenty more as I go on.
However, on page 29, we get the first example with more than ten lines. The book starts by presenting this listing:
private void printGuessStatistics(char candidate, int count) {
String number;
String verb;
String pluralModifier;
if (count == 0) {
number = "no";
verb = "are";
pluralModifier = "s";
} else if (count == 1) {
number = "1";
verb = "is";
pluralModifier = "";
} else {
number = Integer.toString(count);
vderb = "are";
pluralModifier = "s";
}
String guessMessage = String.format("There %s %s %s%s", verb, number, candidate, pluralModifier);
print(guessMessage);
String number;
String verb;
String pluralModifier;
if (count == 0) {
number = "no";
verb = "are";
pluralModifier = "s";
} else if (count == 1) {
number = "1";
verb = "is";
pluralModifier = "";
} else {
number = Integer.toString(count);
vderb = "are";
pluralModifier = "s";
}
String guessMessage = String.format("There %s %s %s%s", verb, number, candidate, pluralModifier);
print(guessMessage);
The book suggests that this listing's local variables have unclear context at first: you have to read through to the end of the listing to work out what they are for. The book then suggests replacing the listing with this.
----------------------
public class GuessStatisticsMessage {
private String number;
private String verb;
private pluralModifier;
public String make(char candidate, int count) {
createPluralDependentMessageParts(count);
return String.format("There %s %s %s%s", verb, number, candidate, pluralModifier);
}
private void createPluralDependentMessageParts(int count) {
if (count == 0) {
thereAreNoLetters();
} else if (count == 1) {
thereIsOneLetter();
} else {
thereAreManyLetters(count);
}
}
private void thereAreManyLetters(int count) {
number = Integer.toString(count);
verb = "are";
pluralModifier = "s";
}
private void thereIsOneLetter() {
number = "1";
verb = "is";
pluralModifier = "";
}
private void thereAreNoLetters() {
number = "no";
verb = "are";
pluralModifier = "s";
}
This might be an improvement in quickly grasping the meaning of the variable names, but it's still fantastically complex. Adding a class always has a cost, because there's the overhead for understanding of wondering whether this class has other uses what it's scope is, how long it's supposed to live, and so on. But, more important, what about this solution:
String getGuessStatistics(char candidate, int count) {
switch (count) {
case : 0
return String.format("There are no %ss", candidate);
case : 1
return String.format("There is 1 %s", candidate);
default:
return String.format("There are %s %ss", Integer.toString(count), candidate);
}
}
To me, this is a better solution. This is to do with a property of code that I'm going to call "glanceability". The shorter a listing, then all else being easier, the quicker it is to understand. However, it's not just that my version is shorter: for me at least, once a piece of code gets down to a certain size and complexity, its overall thrust is capable of being understood at a glance. The last listing achieves this: one glance and I can see that it returns a new String which formats the arguments somehow. In part this is about size, but it's more about simple structure, and in particular having just one control structure: if there were six cases this code would still be glanceable. To get to that "glanceability" I'm prepared to break lots of (perhaps all) other rules. In this case, I have multiple return points, multiple calls to the same static function, and multiple places where I record that all my result strings start "There ..." string. All of which seems insignificant to me because of the glanceability.
The two listings from the book fail the glanceability test. Once they fail that, for me at least, then size and complexity aren't everything for me. If it's going to take a few seconds to understand a listing anyway, then being normalised and split into lots of small pieces might outweigh the gains from being simply shorter. I think I do prefer Bob Martin's second listing over his first, even though it's longer, because neither is glanceable, and the second doesn't have so much code that has to grasped in one go.
The second point I see here is about refactoring. The first listing from the book assumed that the best way to generate the statistics message was to generate a set of component parts according to the parameters, and then stitch those together in the same way in every case. The second listing from the book tried to fix the problems of the first while retaining the same algorithm. To get to my favourite listing, I had to stop refactoring and ditch the algorithm. It often seems to me when refactoring that I'm not radical enough. It's something to do with the refactoring tools found in IDEs, which mostly support code transformations that preserve the existing algorithm. This makes it much easier to improve the implementation of the existing algorithm than to replace it with a better algorithm: the latter requires pushing the keyboard away, sitting back and thinking hard for a while. Quite apart from being hard work, this isn't always the most sociable thing to do in a pair programming situation.
Finally, I didn't write this all in one go. In the meantime, I read some more of the book, and guess what? Just six pages further on, it tells me that "...functions should not be large enough to hold nested structures...". I think this might be a rule that'll guarantee glanceable functions.
And finally, finally, don't let this put you off the book. Everything else I've read so far strikes me as sensible and useful: and I'm confident I'll learn plenty more as I go on.
Friday, 28 November 2008
When it's OK to use a Utils class
What is a Utils class?
Many objects combine state and function: that's one of the key distinguishing features of object-oriented programming. But not all objects are that well-formed. Some objects have state and no function, just constructors, getters and setters. These look a lot like C structs. Ivan Moore suggests calling these NOJOs.
At the other extreme are objects that have no state, but do have function. You can always spot one of these because there are no references (implicit or otherwise) to "this": each method operates only on its arguments.
Because there's no state, there's no real need to instantiate classes like this, and so it's usually simpler to keep all the methods static. Classes like this are often called “utils”, and many experts dislike them.
When shouldn't I use one?
If you have lots of NOJOs, then you need somewhere to put the behaviour associated with them. So one style of programming is to put the data in NOJOs, and the behaviour in Utils classes. This is to miss the point of object-oriented programming, and is generally seen as a bad code smell.
When is it OK to use one?
The very short answer: not very often.
The slightly longer answer: it's ok to use a Utils class when the natural owner of the method is either a data type that isn't a class in your programming language, or the natural owner is a class, but you can't modify the class.
The class java.util.Arrays is a good example: methods like
Arrays.sort(byte[] a);
look like they ought to belong to whatever class a is an instance of. However, Java isn't quite as object-oriented as all that, and in fact a is not an instance of a class at all – arrays are a primitive data type. Some primitive types have their equivalents as first-order classes (think of int and Integer) but that’s really a clumsy workaround for the problem that primitive data types aren’t classes.
Things get worse though: suppose we want to add our own method to Arrays. A real example from the Sun SPOT project was a group of methods for putting numbers into and plucking them out of byte arrays. Here's an example:
public static void writeLittleEndShort(byte[] byteArray, int offset, int value);
We've already seen that we can't put this on the non-existent "byte[]" class. However, we can't even put it on the Arrays class with the existing system-supplied utility methods. J2SE developers can't put it there because you simply can't rebuild the system library. In the Sun SPOTs project, we authored the library, but we still weren't allowed to put it there because changing the interface of any of the system library classes would make it "not proper Java", which Sun wouldn't want to be in the business of shipping. So you end up with not one but two Utils classes holding methods that operate on byte arrays.
Here's another example from the Sun SPOT project:
/**
* Convert an Enumeration to a Vector
* @param items the Enumeration to convert
* @return the Vector
*/
public static Vector enumToVector(Enumeration items) {
Vector result = new Vector();
while (items.hasMoreElements()) {
result.addElement(items.nextElement());
}
return result;
}
This really wants to be be an instance method called toVector() on the class Enumeration: but for the reasons above, it can't be.
In summary
When you meet a Utils class, ask yourself: for each of its methods, could any of the arguments own this method? If yes, then move it. If there are methods left after this, you're stuck with a Utils class.
And finally, could we avoid this?
Well, in Java, today, we can't.
What the world needs is a programming environment where you can add methods to system classes, so you can just add the toVector() method to the Enumeration class. A nice addition to that would be a code repository that doesn't impose the restriction that all of one class has to be in one module, but lets you store a group of methods in a separate module, so that you can manage your toVector() method independently. And finally, it'd be nice if the programming language treated everything as an object, because then we could put that writeLittleEndShort() method directly on the "byte[]" class where it belongs. Given those three things, there really wouldn't be any need for Utils classes.
Those with memories as long as mine will be muttering "Smalltalk and the ENVY/Developer code repository" at this point. Perhaps we'll reinvent that wheel eventually....
Many objects combine state and function: that's one of the key distinguishing features of object-oriented programming. But not all objects are that well-formed. Some objects have state and no function, just constructors, getters and setters. These look a lot like C structs. Ivan Moore suggests calling these NOJOs.
At the other extreme are objects that have no state, but do have function. You can always spot one of these because there are no references (implicit or otherwise) to "this": each method operates only on its arguments.
Because there's no state, there's no real need to instantiate classes like this, and so it's usually simpler to keep all the methods static. Classes like this are often called “utils”, and many experts dislike them.
When shouldn't I use one?
If you have lots of NOJOs, then you need somewhere to put the behaviour associated with them. So one style of programming is to put the data in NOJOs, and the behaviour in Utils classes. This is to miss the point of object-oriented programming, and is generally seen as a bad code smell.
When is it OK to use one?
The very short answer: not very often.
The slightly longer answer: it's ok to use a Utils class when the natural owner of the method is either a data type that isn't a class in your programming language, or the natural owner is a class, but you can't modify the class.
The class java.util.Arrays is a good example: methods like
Arrays.sort(byte[] a);
look like they ought to belong to whatever class a is an instance of. However, Java isn't quite as object-oriented as all that, and in fact a is not an instance of a class at all – arrays are a primitive data type. Some primitive types have their equivalents as first-order classes (think of int and Integer) but that’s really a clumsy workaround for the problem that primitive data types aren’t classes.
Things get worse though: suppose we want to add our own method to Arrays. A real example from the Sun SPOT project was a group of methods for putting numbers into and plucking them out of byte arrays. Here's an example:
public static void writeLittleEndShort(byte[] byteArray, int offset, int value);
We've already seen that we can't put this on the non-existent "byte[]" class. However, we can't even put it on the Arrays class with the existing system-supplied utility methods. J2SE developers can't put it there because you simply can't rebuild the system library. In the Sun SPOTs project, we authored the library, but we still weren't allowed to put it there because changing the interface of any of the system library classes would make it "not proper Java", which Sun wouldn't want to be in the business of shipping. So you end up with not one but two Utils classes holding methods that operate on byte arrays.
Here's another example from the Sun SPOT project:
/**
* Convert an Enumeration to a Vector
* @param items the Enumeration to convert
* @return the Vector
*/
public static Vector enumToVector(Enumeration items) {
Vector result = new Vector();
while (items.hasMoreElements()) {
result.addElement(items.nextElement());
}
return result;
}
This really wants to be be an instance method called toVector() on the class Enumeration: but for the reasons above, it can't be.
In summary
When you meet a Utils class, ask yourself: for each of its methods, could any of the arguments own this method? If yes, then move it. If there are methods left after this, you're stuck with a Utils class.
And finally, could we avoid this?
Well, in Java, today, we can't.
What the world needs is a programming environment where you can add methods to system classes, so you can just add the toVector() method to the Enumeration class. A nice addition to that would be a code repository that doesn't impose the restriction that all of one class has to be in one module, but lets you store a group of methods in a separate module, so that you can manage your toVector() method independently. And finally, it'd be nice if the programming language treated everything as an object, because then we could put that writeLittleEndShort() method directly on the "byte[]" class where it belongs. Given those three things, there really wouldn't be any need for Utils classes.
Those with memories as long as mine will be muttering "Smalltalk and the ENVY/Developer code repository" at this point. Perhaps we'll reinvent that wheel eventually....
Subscribe to:
Posts (Atom)
