Tuesday 9 December 2008

Clean code

I've recently started reading a book called Clean Code by Bob Martin. I like this book's attitude: right up front you're left in no doubt that this is a book written by and for programmers. You're going to have to study real code examples hard to get the most out of it. So far, so good.

However, on page 29, we get the first example with more than ten lines. The book starts by presenting this listing:

private void printGuessStatistics(char candidate, int count) {
String number;
String verb;
String pluralModifier;
if (count == 0) {
number = "no";
verb = "are";
pluralModifier = "s";
} else if (count == 1) {
number = "1";
verb = "is";
pluralModifier = "";
} else {
number = Integer.toString(count);
vderb = "are";
pluralModifier = "s";
}
String guessMessage = String.format("There %s %s %s%s", verb, number, candidate, pluralModifier);
print(guessMessage);


The book suggests that this listing's local variables have unclear context at first: you have to read through to the end of the listing to work out what they are for. The book then suggests replacing the listing with this.
----------------------

public class GuessStatisticsMessage {
private String number;
private String verb;
private pluralModifier;

public String make(char candidate, int count) {
createPluralDependentMessageParts(count);
return String.format("There %s %s %s%s", verb, number, candidate, pluralModifier);
}

private void createPluralDependentMessageParts(int count) {
if (count == 0) {
thereAreNoLetters();
} else if (count == 1) {
thereIsOneLetter();
} else {
thereAreManyLetters(count);
}
}

private void thereAreManyLetters(int count) {
number = Integer.toString(count);
verb = "are";
pluralModifier = "s";
}

private void thereIsOneLetter() {
number = "1";
verb = "is";
pluralModifier = "";
}

private void thereAreNoLetters() {
number = "no";
verb = "are";
pluralModifier = "s";
}


This might be an improvement in quickly grasping the meaning of the variable names, but it's still fantastically complex. Adding a class always has a cost, because there's the overhead for understanding of wondering whether this class has other uses what it's scope is, how long it's supposed to live, and so on. But, more important, what about this solution:


String getGuessStatistics(char candidate, int count) {
switch (count) {
case : 0
return String.format("There are no %ss", candidate);
case : 1
return String.format("There is 1 %s", candidate);
default:
return String.format("There are %s %ss", Integer.toString(count), candidate);
}
}


To me, this is a better solution. This is to do with a property of code that I'm going to call "glanceability". The shorter a listing, then all else being easier, the quicker it is to understand. However, it's not just that my version is shorter: for me at least, once a piece of code gets down to a certain size and complexity, its overall thrust is capable of being understood at a glance. The last listing achieves this: one glance and I can see that it returns a new String which formats the arguments somehow. In part this is about size, but it's more about simple structure, and in particular having just one control structure: if there were six cases this code would still be glanceable. To get to that "glanceability" I'm prepared to break lots of (perhaps all) other rules. In this case, I have multiple return points, multiple calls to the same static function, and multiple places where I record that all my result strings start "There ..." string. All of which seems insignificant to me because of the glanceability.

The two listings from the book fail the glanceability test. Once they fail that, for me at least, then size and complexity aren't everything for me. If it's going to take a few seconds to understand a listing anyway, then being normalised and split into lots of small pieces might outweigh the gains from being simply shorter. I think I do prefer Bob Martin's second listing over his first, even though it's longer, because neither is glanceable, and the second doesn't have so much code that has to grasped in one go.

The second point I see here is about refactoring. The first listing from the book assumed that the best way to generate the statistics message was to generate a set of component parts according to the parameters, and then stitch those together in the same way in every case. The second listing from the book tried to fix the problems of the first while retaining the same algorithm. To get to my favourite listing, I had to stop refactoring and ditch the algorithm. It often seems to me when refactoring that I'm not radical enough. It's something to do with the refactoring tools found in IDEs, which mostly support code transformations that preserve the existing algorithm. This makes it much easier to improve the implementation of the existing algorithm than to replace it with a better algorithm: the latter requires pushing the keyboard away, sitting back and thinking hard for a while. Quite apart from being hard work, this isn't always the most sociable thing to do in a pair programming situation.

Finally, I didn't write this all in one go. In the meantime, I read some more of the book, and guess what? Just six pages further on, it tells me that "...functions should not be large enough to hold nested structures...". I think this might be a rule that'll guarantee glanceable functions.

And finally, finally, don't let this put you off the book. Everything else I've read so far strikes me as sensible and useful: and I'm confident I'll learn plenty more as I go on.

Friday 28 November 2008

When it's OK to use a Utils class

What is a Utils class?
Many objects combine state and function: that's one of the key distinguishing features of object-oriented programming. But not all objects are that well-formed. Some objects have state and no function, just constructors, getters and setters. These look a lot like C structs. Ivan Moore suggests calling these NOJOs.

At the other extreme are objects that have no state, but do have function. You can always spot one of these because there are no references (implicit or otherwise) to "this": each method operates only on its arguments.

Because there's no state, there's no real need to instantiate classes like this, and so it's usually simpler to keep all the methods static. Classes like this are often called “utils”, and many experts dislike them.

When shouldn't I use one?
If you have lots of NOJOs, then you need somewhere to put the behaviour associated with them. So one style of programming is to put the data in NOJOs, and the behaviour in Utils classes. This is to miss the point of object-oriented programming, and is generally seen as a bad code smell.

When is it OK to use one?
The very short answer: not very often.

The slightly longer answer: it's ok to use a Utils class when the natural owner of the method is either a data type that isn't a class in your programming language, or the natural owner is a class, but you can't modify the class.

The class java.util.Arrays is a good example: methods like

Arrays.sort(byte[] a);


look like they ought to belong to whatever class a is an instance of. However, Java isn't quite as object-oriented as all that, and in fact a is not an instance of a class at all – arrays are a primitive data type. Some primitive types have their equivalents as first-order classes (think of int and Integer) but that’s really a clumsy workaround for the problem that primitive data types aren’t classes.

Things get worse though: suppose we want to add our own method to Arrays. A real example from the Sun SPOT project was a group of methods for putting numbers into and plucking them out of byte arrays. Here's an example:

public static void writeLittleEndShort(byte[] byteArray, int offset, int value);

We've already seen that we can't put this on the non-existent "byte[]" class. However, we can't even put it on the Arrays class with the existing system-supplied utility methods. J2SE developers can't put it there because you simply can't rebuild the system library. In the Sun SPOTs project, we authored the library, but we still weren't allowed to put it there because changing the interface of any of the system library classes would make it "not proper Java", which Sun wouldn't want to be in the business of shipping. So you end up with not one but two Utils classes holding methods that operate on byte arrays.

Here's another example from the Sun SPOT project:

/**
* Convert an Enumeration to a Vector
* @param items the Enumeration to convert
* @return the Vector
*/
public static Vector enumToVector(Enumeration items) {
Vector result = new Vector();
while (items.hasMoreElements()) {
result.addElement(items.nextElement());
}
return result;
}


This really wants to be be an instance method called toVector() on the class Enumeration: but for the reasons above, it can't be.

In summary
When you meet a Utils class, ask yourself: for each of its methods, could any of the arguments own this method? If yes, then move it. If there are methods left after this, you're stuck with a Utils class.

And finally, could we avoid this?
Well, in Java, today, we can't.

What the world needs is a programming environment where you can add methods to system classes, so you can just add the toVector() method to the Enumeration class. A nice addition to that would be a code repository that doesn't impose the restriction that all of one class has to be in one module, but lets you store a group of methods in a separate module, so that you can manage your toVector() method independently. And finally, it'd be nice if the programming language treated everything as an object, because then we could put that writeLittleEndShort() method directly on the "byte[]" class where it belongs. Given those three things, there really wouldn't be any need for Utils classes.

Those with memories as long as mine will be muttering "Smalltalk and the ENVY/Developer code repository" at this point. Perhaps we'll reinvent that wheel eventually....