torsdag 28 januari 2010

Learning about the Wart

When I first read about Hungarian Notation I was baffled how anyone could even earn money thinking and writing about things as naming. Even less how anyone could suggest using several letters of consonants for a variable name followed by a "descriptive" suffix such as wnd. I soon learned the importance of naming, but never grasped the benefits of the Hungarian notation.

About a year later I found out that this had in fact been adopted in the Win32 API. As an API lot's of people look up to it as exemplary code and follow it's style lavishly and the Hungarian notation spread quickly.

Today, to my great relief, the hungarian notation has vanished almost entirely and there is a reassuring concensus among most programming communities that it was a horrible path that must never be followed again.

However, lately there has been an unhealthy outbreak of the hungarian wart. Apparently inspired by this article by Joel Spolsky people claim it has been all misinterpreted. The original thought was not to prefix by variable type but rather by variable purpose. This is called Apps Hungarian. The conveyed feeling is that there is the bad™ Hungarian notation as spread by the Win32 API and then there is the forgotten good Hungarian notation as intended by its inventor. This is an encouraging thought. It is true that it was misinterpreted. It's also true that Hungarian Apps is superior to the much despised System Hungarian notation.

Still not quite right

This new mutation of the persistent Hungarian Wart is sadly also problematic.

Let's dive further into the example given by Joel Spolsky since it it a very good example.

So to recap we have have this html-rendering systems that always must encode string it writes. As the the problem becomes more interesting we find out that some strings are not encoded while others are. Those cases must be handled by the system and the goal here is of course to make the code easy to read and that errors are noticed easily.

The Apps Hungarian comes in handy here. If you use clever prefixing such as s for encoded strings (safe to write) and u for nonencoded strings (unsafe to write) you can clearly see a programming error.

Write(uName);

This statement obviously contains an error since a nonencoded string is written. The point here is that each row can be inspected and decided whether it might cause an nonencoded string to be written.

The twist

Here is the twist: Rather than using prefixes to diffrentiate between two types of strings - Use the type system! In this exact example I propose using two diffrent classes - The EncodedString and the RawString. This way we can make the Write method to only take a EncodedString as parameter. If desired we can even create an overloaded Write method that takes the raw string and automacially encodes it before writing it!

Ok, so what are the cons of this approach. Yes, a tiny bit of extra boilerplate since the EncodedString class and its raw ditto has an inner string field that required operations are delgated to. Lets say encoded strings needs to be concatenated. This can be implemented for example like this in the EncodedString class.

public EncodedString operator +(EncodedString rhs) { return new EncodedString(_s + rhs._s); }

Also this gives you a performance penalty for the extra method call. In a html-rendering program like this, and basically for all programs, I would say this is neglectable.

In the extremly buzzed Domain Driven Design lingo this is called a Value Object. Call it what you want , but it's a nice trick!

On the topic - I saw this declaration the other day:

class FooExtensionsClass

So it's a class that is a class that has the word "class" in the class name. Ok, ok, I get it. Watch out - the wart might be back again. True story.