Saving lost developer time with better hardware

October 4th, 2008

A common problem that I see on projects is that the computers available to the teams are mediocre. The obvious example of this is when the computers given to developers are mediocre, but I also think that there is a compelling point to be made around solving performance on build machines with hardware instead of software.

Developer Machines

I was once on a project where the local update and build process became an hour long. I won’t get into the details, but it was largely an IO bound delay, with portions with the processor as the bottleneck. We were using Dell 610 laptops. When some developers started gettting Dell 620’s (dual core laptops), we discovered that it reduced the local build time on the machines by 33% to 50%. Whoa.

Think about that. A 60 minute build cut down to 30 minutes. Let’s assume that developers only build once per day and that each developer has an average cost of $100 per hour (total cost to the organization, not just wages). With those savings, getting every developer a Dell 620 instead of a Dell 610 pays for itself in couple weeks. This is just considered cutting a long build in half. There are many other situations where having a slow machine causes lost developer time.

We lobbied for getting the developers better machines, and were mostly denied. I discovered that organizations measure the cost of people separate from the cost of hardware. In fact, they may be accounted for by different departments entirely, where an arbitrary budget is given to the department that issues employee computers.

I’ve seen this on every project I’ve been on. We are given slow machines, and time is lost. It may be lost because I’m running grep over a lot of files, it may be because when I have my all my development tools open and the machine slows down.

I think that it’s fine that organizations begin developers with cheap machines, but they should be quick to spend money at the first sign that it is needed by the developers. I believe that it is an aspect of agility that many organizations fall short on, where the ability to respond to constraints in the software development system is hindered by the structure and policies of the organization.

In fact, I think that IT organizations should do a few tests against their technology stack and see what kind of performance difference exist, and use those numbers to decide what kind of machines that developers can use that will result in the best performance while being reasonable on cost. This is especially true of Java J2EE projects, as most of the tools and applications are intensive, and the time it takes to build an entire application can be intensive.

Build Machines

If your project has any kind of continuous integration (and it should!) then you have probably felt the pain of long builds at some point. I’ve seen this on every project I’ve been on. There are two areas in particular that I’ve found to be painful: Long running regression or acceptance tests, and long compilation and deployment cycles due to heavyweight tools.

Often builds and tests are segmented into builds that are run locally on developer machines, and builds that are run by the build server. A typical approach is to have developers run unit tests and fast running integration tests locally while developing, but to have long running integration tests and acceptance tests run by a build server, where failures will be fixed later by developers when the build completes.

Many projects will find over time that the time it takes to run these large integration tests and acceptance tests becomes so long that the value is reduced. The time it takes to get feedback might be hours or even days. Often, these tests are failing, as by the time they complete, multiple developers or teams may made changes that break a portion of the test suite.

I’ve seen or read of different approaches to this problem, from using in-memory databases, to manually splitting the regression suites into separate builds or “pipelines”, to distributed computing, to transparent parallelization of tests.

An example of a new tool for transparently running tests in parallel is the Selenium Grid which attempts to run selenium tests in parallel. While I think there is merit in exploring these tools, they are non-trivial to setup and maintain, and while it may result in the build/test time being cut down to a fraction of it’s original time, it increases the complexity of the infrastructure that developers will have to maintain. There tends to be surprise issues with parallization as well. You have to make sure that you can have tests that are writing to the filesystem, querying a database, or calling other services in parrallel.

One day, I hope to try a different approach. I would rather spend the money trying to use hardware to solve the problem instead of using some complicated tool. From a previous experience of dealing with an incredibly IO bound build, I’ve long dreamt of building a hard drive out of RAM. I’m not talking about using flash memory; I’m talking about using DDR RAM instead of a traditional hard drive.

I recently looked into this concept, and I found that there are a few manufacturers out there that provide devices to do this very thing, such as the Hyper Drive 4. There are a few other devices out there that can achieve this, but I liked the information/propaganda on the Hyper Drive page the best.

The stats claimed by using a RAM based hard disk are nothing short of sexy.

I won’t reiterate the numbers here, but depending on the usage, it ranges from an order of magnitude to several orders of magnitude in increased performance. Even in builds and test suites that are not predominately IO bound, I am willing to bet that the performance boost to the operating system will translate into large gains for the performance of the tests. My favorite statistic was that Windows XP booted in 2 seconds with their test configuration, and that was only because of device polling.

I don’t think that the cost of such as system is unreasonable. One 16 gigabyte drive using the Hyper Drive system would probably cost around $5000. Assuming an average developer cost of $100 dollars an hour, it pays for itself if one week of 1 developers time is saved, let alone considering the benefits to an entire team. Come to think of it, I would argue that developers should have similar setups for their local machines. For instance, grep would probably be instantaneous with a DDR based drive.

An “old school” technique for testing summation

September 20th, 2008

I wrote some tests a few weeks back where I had to sum up a bunch of values on a collection of objects and assert that the summation happened correctly. A naive way to do this is to build the test objects with all the values equal to 1, sum them, and assert that the total is the same as the number of objects created for the test.

  1. int calculateClaimTotal(List<Claim> claims) {
  2.         int total = 0;
  3.         for (Claim claim : claims) {
  4.                 total += claim.getAmount();
  5.         }
  6.         return total;
  7. }

In this example, you could test calculateClaimTotal by creating 3 claims with amounts = 1, and assert that the total is 3.

This may seem sufficient for a simple summation, but it becomes more flawed when the summation becomes based on business rules. Now, if the values to be summed are all 1, you can’t differentiate which rules were enforced by looking at the resulting number directly.

  1. int calculateClaimTotal(List<Claim> claims) {
  2.         int total = 0;       
  3.         for (Claim claim : claims) {
  4.                 if (claim.isActive()) {
  5.                         total += claim.getAmount();
  6.                 }
  7.         }
  8.         return total;
  9. }

Now, we want to be able to differentiate that the active claims are the ones that get summed. This can become a bloated assertion to make. For instance, you can extend the test objects and count invocations on getAmount(). Then, you can assert that only the objects you expected where used to calculate the total.

A much simpler way of asserting that the objects that you expected were the ones that were used is to use amount values that, when summed, always have a distinct result. This is achieved by using numbers that map to distinct bitwise values:

00001 = 1
00010 = 2
00100 = 4
01000 = 8
10000 = 16

There is no way to add any of these numbers and have them be equal to any other summation of a subset of the numbers. This is because addition of 2 of these numbers is equivalent to an “or” operation on the bits of the numbers:

00001 = 1
00010 = 2
+__________
00011 = 3

This characteristic allows me to identify exactly what claims where used in the summation, because the resulting value could only occur from the values I expect:

Claim 1 is active with value = 1
Claim 2 is inactive with value = 2
Claim 3 is active with value = 4

I can assert that calculateClaimTotal will return 5 here, and that 5 will occur if and only if Claim 1 and Claim 3 are the ones used for the summation. Essentially, a bit location set to 1 instead of 0 will uniquely identify the object that the value came from.

I call this an old school testing technique because these days, there is not a lot of bitwise math that occurs in the course of programming. Back in the day, high performance C programmers would use bitwise operation often, and api’s such as the win32 api used bitwise math as part of api.

Even in today’s fancy world of high level abstraction, sometimes it takes a smooth low level trick like this one to achieve the most succinct code.

There more to consider than whether to return null

August 19th, 2008

With the previous discussions about whether or not we should return null from a method, I think that there is a few important points that were missed. I wanted to weigh in one last time, before I personally declare this topic beaten to death.

In the example that was given, it was asked what to do have something like:

  1. VendingMachine vendingMachine = new VendingMachine();
  2. vendingMachine.addMoney(Money.ONE_DOLLAR); // not enough money, drinks cost $1.50
  3. Drink drink = vendingMachine.getDrink();

// What now? Oh Crap, we can’t have a drink at this point. We didn’t put enough money in the machine.

In the case that there isn’t enough money, we could have the getDrink() method could return null, return a Drink.NULL. In any case, we are defining an invariant for the use of getDrink(). We are promising that, in the event that getDrink() is called and not enough money was given, we are going to do one of those two things. Alternatively, we are saying that if and only if there is enough money added, will it return a valid drink object.

Also, with this example, the assumption is that we are going to check the post-condition of the operation in order to detect failure:

  1. Drink drink = vendingMachine.getDrink();
  2. if (drink == null)  // Oh Crap!
  3.  
  4. or with a null object
  5.  
  6. Drink drink = vendingMachine.getDrink();
  7. if (drink.isNull()) // Oh Crap!
  8.  

Ideally, I don’t ever want my code to veer from the happy path. I don’t want to ever have my code reach an invalid situation where it cannot return a drink. I also don’t want to expend much effort as a developer handling the scenario. I would rather write tests that asserts that it can’t happen in relation to the client object.

One such option, is to provide a different invariant scenario - One where we check a pre-condition:

  1. VendingMachine vendingMachine = new VendingMachine();
  2. while (vendingMachine.needsMoreMoneyForADrink()) {
  3.     vendingMachine.addMoney(Money.ONE_DOLLAR);
  4. }
  5. Drink drink = vendingMachine.getDrink(); // we know for sure that we have enough money for a drink here
  6.  

or

  1. if (!vendingMachine.needsMoreMoneyForADrink()) {
  2.     Drink drink = vendingMachine.getDrink();
  3. }

The invariant is now that if needsMoreMoneyForADrink() returns false, we will always get a drink.

Personally, I like the precondition approach (in this particular example). It reads well, and it adheres to the principle of Command-Query separation. Also, in the post-condition example, null (or a wrapper) is being used to communicate that not enough money has been given to the vending machine, and that more should be added. Personally, I’d like to ask the vending machine if I’m supposed to give more money. Note that situation is similar to an iterator pattern.

Along the way, Andy referred to using exceptions because they follow the Principle of “Tell, Don’t Ask.” I don’t believe that it qualifies as a good representation of that principle, but I have an example a little later that I believe is better.

Andy’s example of throwing an exception when getDrink() is called under conditions when it shouldn’t have been raises an important question for me. Should we have a method that fails, and then communicates failure back to the client object (such as with throwing an exception)? Perhaps we should not have a method that is expected to return a drink at all.

In the world of object to object communication, there are only a few ways for an object to respond to the communication of another object, with each having strengths and weaknesses.

1.) You can return a value, or an object reference. Null is the equivalent of reponding with “No, I can’t or won’t give that to you.”

2.) You can return some kind of communication object or value, one that perhaps allows you to communicate a number of message types. I consider a Null object a variation of this. You could also return an enum, where each enum value represents a different type of message.

3.) You could set some kind of global variable to hold a message - something that can be read by the client object. This is generally considered terrible OO, but I see it from time to time in the form of Singletons.

4.) In languages like Java, you can throw an exception. Exceptions generally signify an exceptional error situation. Exceptions can be handled, but if not, will continue to bubble throughout the code, interrupting every called frame, finally interrupting the flow of execution if not handled.

5.) You can respond with a method call. This is the standard “Proper” way of having objects communicate. Of the different message/communication types available to object, this one is the richest and most intentional (most explicit). You can send a message (a method call) and send parameters as details with the message.

I generally avoid using exceptions to communicate between objects. I try to reserve them for really exceptional situations. I will spend time looking at important code and deciding on how to do it well, but many, many time before, I have written methods or functions that just return null, and I also wrote the code that used it. It’s simple, and people who look at that kind of logic will be familiar with it, so it’s not going to be a surprise.

Anyway, I wanted to weigh in on a different way of “handling nulls.” I created a sample solution to the VendingMachine::getDrink() dilemma; one that doesn’t assume that there is a getDrink() method at all. It’s an example of the VendingMachine responding to the client object (a Person) with a richer object communication (and a better example of “Tell, don’t ask” than using exceptions). Imagine that the interaction begins with someone calling Person::getARefreshingDrink() and passing in a VendingMachine.

  1. public class Person {
  2.     VendingMachine vendingMachine;
  3.    
  4.     void mustAddMoreMoney() {
  5.         vendingMachine.addMoney(Money.ONE_DOLLAR, this);
  6.     }
  7.  
  8.     public void getARefreshingDrink(VendingMachine machine) {
  9.         this.vendingMachine = machine;
  10.         vendingMachine.addMoney(Money.ONE_DOLLAR, this);
  11.     }
  12.    
  13.     void enoughMoneyGiven() {
  14.         vendingMachine.enterDrinkSelection(DrinkType.COKE, this);
  15.     }
  16.    
  17.     void givePersonChangeAndADrink(Money change, Drink drink) {
  18.         // let’s guzzle that drink and pocket the change
  19.     }
  20. }
  21.  
  22.  
  23. public class VendingMachine {
  24.     private static final Money COST_PER_DRINK = Money.valueOf(“$1.50″);
  25.     private Money totalMoney = Money.ZERO;
  26.  
  27.     public void addMoney(Money money, Person person) {
  28.         totalMoney = totalMoney.add(money);
  29.         if (notEnoughMoney()) {
  30.             person.mustAddMoreMoney();
  31.         } else {
  32.             person.enoughMoneyGiven();
  33.         }
  34.     }
  35.    
  36.     public void enterDrinkSelection(DrinkType type, Person person) {
  37.         if (notEnoughMoney()) {
  38.             person.mustAddMoreMoney();
  39.         } else {
  40.             person.givePersonChangeAndADrink(getChange(), getDrinkFromType(type));
  41.             totalMoney = Money.ZERO;
  42.         }
  43.     }
  44.  
  45.     private Drink getDrinkFromType(DrinkType type) {
  46.         return new Drink(type);
  47.     }
  48.  
  49.     private boolean notEnoughMoney() {
  50.         return COST_PER_DRINK.greaterThan(totalMoney);
  51.     }
  52.  
  53.     private Money getChange() {
  54.         return totalMoney.minus(COST_PER_DRINK);
  55.     }
  56.    
  57. }

I’m not arguing that this is optimal either, but I got other blog posts to write, and stuff to do. It’s meant to represent a way of having objects interact in a rich manner. I’m sure you can imagine other ways to make this better, such as moving the Person method definition into a “Drinker” interface.

Java Swing memory leak: JDialog, JDateChooser, and the evil of Singleton’s

August 16th, 2008

I recently solved a mysterious memory leak puzzle in a java swing application that I’m working on. The source of the memory leak was proving elusive, even with people on the team using JProbe in order to find the source of the leak.

At the root of the problem, the Dialog was too complex and with too much logic to easily pinpoint the problem. Circularly referential object graphs were everywhere. I had to rip out most of the dialog in large chunks until I reduced it down to a simple JDialog, and then considered each possible factor.

I was a little tripped up along the way. Apparently, calling Runtime.getRuntime().gc() doesn’t necessarily collect memory, but if push push the JVM to the point of an OutOfMemoryError, it’ll do everything in it’s power to get all the available memory back. Pushing it the edge was a sure way to get an honest answer.

Normally, when Java memory leaks happen in Swing, the culprit is a listener of some kind. It’s easy to register a listener and forget about it, while it continues to point to other objects with a strong reference. It’s further complicated by the fact that it’s not always obvious when a reference is created, like in the case of defining an anonymous inner class.

In my case I discovered that the culprit of the memory leak was a JDateChooser object that we are using. That object is defined as part of the JCalendar api. Specifically, inside the dialog, a PropertyChangeListener was anonymously created and registered to the JDateChooser, creating a circular reference from the JDateChooser and the dialog.

So, what was JDateChooser doing? Sure enough, in the constructor, it was registering a listener with a Swing singleton called MenuSelectionManager. That singleton never dies, and does not releasing the reference so that the garbage collector can do it’s magic.

The code in the JDateChooser constructor:

  1. // Corrects a problem that occured when the JMonthChooser’s combobox is
  2. // displayed, and a click outside the popup does not close it.
  3.  
  4. // The following idea was originally provided by forum user
  5. // podiatanapraia:
  6. changeListener = new ChangeListener() {
  7.  public void stateChanged(ChangeEvent e) {
  8.     // do stuff that creates a reference back to the JDateChooser
  9.  }
  10. };
  11. MenuSelectionManager.defaultManager().addChangeListener(changeListener);

I love the comment. It’s just too bad that the forum user didn’t recommend constructor injection of that manager, which would have allowed me an elegant means of preventing the memory leak without modifying the library.

For instance, I could make a WeakReferenceMenuSelectionManager that stores the listeners using WeakReferences - thus allowing the garbage collector to reclaim the objects in the event that there are no other objects with a strong reference to the listener.

Unfortunately, when the maker of JDateChooser realized they were causing memory leaks, they came up with this solution:

  1. /**
  2. * Should only be invoked if the JDateChooser is not used anymore. Due to popup
  3. * handling it had to register a change listener to the default menu
  4. * selection manager which will be unregistered here. Use this method to
  5. * cleanup possible memory leaks.
  6. */
  7. public void cleanup() {
  8.   MenuSelectionManager.defaultManager().removeChangeListener(changeListener);
  9.   changeListener = null;
  10. }

This is better than nothing, but is still far inferior to a WeakReference approach. This is an excellent example of why we shouldn’t write code that reaches out and calls a Singleton. statically referencing an object does not allow me to reconfigure the object without changing the source code - something I would prefer not to do in a library. Luckily, the JCalendar api is covered under the LGPL, so at least changing the library was an option.

The better fix was this. First, I create a weak change listener as a proxy around the original change listener.

  1. public class WeakChangeListenerProxy implements ChangeListener {
  2.  
  3.     public WeakReference reference;
  4.  
  5.     public WeakChangeListenerProxy(ChangeListener listener) {
  6.         this.reference = new WeakReference(listener);
  7.     }
  8.  
  9.     public void stateChanged(ChangeEvent e) {
  10.         ChangeListener actualListener = (ChangeListener)reference.get();
  11.         if (actualListener != null) {
  12.             actualListener.stateChanged(e);
  13.         }
  14.     }
  15. }

Then, In JDateChooser, I changed the constructor like so:

  1. // The following idea was originally provided by forum user
  2. // podiatanapraia:
  3. ChangeListener changeListener2 = new ChangeListener() {
  4.     // Change listener body
  5. };
  6.      
  7. changeListener = new WeakChangeListenerProxy(changeListener2);
  8. MenuSelectionManager.defaultManager().addChangeListener(changeListener);

Finally, I wanted to clean up the remaining WeakReference object. Now that the JDateChooser will get garbage collected, I can do that by creating a finalize method.

  1. protected void finalize() throws Throwable {
  2.     super.finalize();
  3.     MenuSelectionManager.defaultManager().removeChangeListener(changeListener);
  4. }

I posted this solution to the JCalendar forum, hopefully, it’ll get added to the next release. The cleanup() method is not an obvious solution. As a user of a widget api (especially in Java), I wouldn’t expect that I need to free up any resources, and indeed, a lot of time and money was wasted with developers looking into this issue.

Oh well, at least I got to have some fun with Weak References.

Use a Java Decompiler with your IDE

August 13th, 2008

If you work in Java, and you don’t use a decompiler, you haven’t lived.

Well, maybe that’s a bit excessive. Still, I’m shocked by the number of developers that don’t use a decompiler like Jad combined with their IDE, like with Jadclipse and Eclipse.

In eclipse, you can download the Java api source code and attach it that so you can always look at the actual code - original comments and all.

However, you want to be able to look at the source code for any jars that your application depends on as well. I didn’t even realize how useful and freeing it is to be able to do that until I started. I would say that less than 30% of all developers that I’ve met use a decompiler, but it’s definitely one of the top plugins you have to have.

Programming by Contract considered excessive

August 11th, 2008

Recently, Andy Palmer post a blog article called returning null considered dishonest.

I felt that the topic started to go down the road of enforcing a contract in order to prevent nulls. Checked exception are an example of explicitly adding new behavior to the contract. The contract details a response to certain parameters in the form an exception.

Taking a step away from checked exceptions, you have unchecked exceptions. Unchecked exception that are typed are especially pointless, as they weren’t declared in the first place, and generally shouldn’t be explicitly handled. If you do want to make contract enforcement with unchecked exceptions, you can throw RuntimeExceptions or do an Assert.true() at the start of the method to make sure that a method is called with then needed preconditions.

My issue is that you’re adding behaviour - behavior that was probably added somewhere else already. Every line of code added incurs an extra bit of cost in terms of maintenance. Super defensive coding strategies should be justified due to the merits of the given situation, but not used as a default behaviour.

Null’s are okay when they are not meant to be handed out as a meaningful response. Once you try to use null values to convey a value or a meaning, then you start seeing a proliferation of nulls and null checks.

For instance, I’m not advocating this:

  1. public Object foo(Object parameter) {
  2.    if (parameter == null) {
  3.       return null;
  4.    }
  5.    // do stuff
  6. }
  7.  
  8. Object result = foo(parameter);
  9. if (result == null) {
  10.    // handle error situation
  11. }

Nor am I advocating this:

  1. public Object foo(Object parameter) {
  2.    Assert.notNull(parameter);
  3.    // do stuff
  4. }
  5.  
  6. try {
  7.     Object result = foo(parameter);
  8. } catch(Exception e) {
  9.     // handle error
  10. }

I’m essentially advocating this:

  1. public Object foo(Object parameter) {
  2.    // just do stuff, and trust the other developers to be good citizens
  3. }
  4.  
  5. Object result = foo(parameter);

Don’t check for null, don’t throw exceptions. Just make sure that nulls don’t occur at the entry point to the system. Most of the time, you should just write your method to do stuff and make sure that the error conditions are handled in one place as early as possible to their introduction into the system, such as when the user enters some data.

Most code bases are not published to the public. There is a big difference between published api’s and internal code bases. With a public api, you have to create more of a contract, because you want to give your providers something that they can depend on. There are too many consumers for immediate communication and collaboration. In an internal codebase, you are your own consumer and supplier.

My point here is that I disagree with adding more code complexity to fix a problem that is generally a behavioral issue. Having to handle every possible unwanted condition of a method parameter becomes a method tax, creates duplication, and results in a lot of tests that aren’t testing the interesting stuff.

The Power of Culture

August 10th, 2008

I recently blogged about preventing the NNPP through better hiring practices. Though I feel that a strict hiring practice is key to ensuring consistently higher quality developers, I wanted to talk a little bit about what comes after an employee is hired.

Companies should have a serious view of employees as an asset. One that should be about individuals being nurtured and grown into something and someone greater. This is a large part of why I advocate pair programming, because there is no better way to raise the skill level of developers than have them constantly working hands-on with other skilled members of their profession.

I think though, that there is a more important point here that is difficult to define: a company has to have a culture of wanting the best people and facilitating the continued growth of those people. It’s so intangible because it doesn’t come from a CEO proclaiming a set of values that came out of a management book. It’s about what the individual people in company actually believe and embody.

Using Thoughtworks as an example again, I was initially very surprised at how ever present the culture of the company is. It’s telling that we call ourselves thoughtworkers. We identify as part of a group, not just as employees of a company. We want to be part of the group. I’ve met people that fit the culture and continue to refer to themselves as Thoughtworkers even after they’ve left the company to move on to some other career pursuit. In particular, an aspect of the culture is that of excellence.

How did this come to be? Well, I can’t say for sure - it’s an evolutionary process. However, I think that it happened for the same reason that many organizations develop a culture of excellence, whether it be great sports teams or great universities.

First, the organization has to try to select the best. There has to some amount of pride in getting in. Then, once you get in, and you meet many people that are more skilled than you (or perhaps, strongly skilled in different ways), and you see how you can improve. In fact, you want to improve - you want to live up to the standard of excellent that the organization has set. You don’t want to consider yourself below average in your group.

In teams and organizations with really strong and successful cultures, it’s not just the leadership that wants everyone on the team to be superlative, but it’s the team members as well. At Thoughtworks, I’ve met other thoughtworkers that give me suggestions on books that I should read, or technologies that I should learn. I’ve developed student/teacher relationships with some people, and I’ve developed teacher/student relationships with others. With many, it’s been peer to peer, but with us debating and sharing views and opinions.

I use Thoughtworks as an example of a strong culture, but I don’t want to make it sound like it’s perfect. It has it’s flaws too, but as an organization, it’s the best of what I’ve experienced. On the other side, I’ve worked with/at companies that have what I consider to be a poor or negative culture, but my ranting about that will have to wait for another day.

Preventing the Net Negative Producing Programmer

August 7th, 2008

Jay Fields recently published a blog post where (amongst many other points) he mentioned the concept of the Net Negative Producing Programmer, referred to as NNPP.

I’d never heard that specific term before and I enjoyed the read. Almost every project that I have worked on had a healthy number of people who’s efforts were borderline negative to the teams productivity. I say borderline because I’ve never actually measured it - but I have worked with people that would take a week to do a days work, and it would be defect ridden when they claimed that it is done.

The part that struck me about the paper though was that it said that dismissal of an NNPP is a last resort. I do agree with the paper that it argues against measuring the person with absolute statistics in making this decision, but on collaborative/agile projects, it’s usually easy to spot the weak links. I can understand that it can be expensive and risky to fire someone, but software teams must have a way of weeding out these individuals, and removing them. I don’t just mean moving them to another team; I believe that they should be laid off. I wouldn’t even feel too bad about it, IT is still a high demand industry, and even the worst of the NNPP will find some giant company to disappear into.

There is a better way however, one that will prevent companies from getting into the position of having to worry about this in the first place. Improve the hiring standards. If you can keep the NNPP’s out of your company in the first place, you don’t have to worry about firing them.

At Thoughtworks, the interview process is pretty rigorous, there are phone interviews, a code submission, logic tests, and lengthy in-person interviews with people of varying roles. The highlight for me is the code submission. It’s pretty telling from looking at a code submission whether the candidate is worth pursuing in the first place. I don’t care how many languages and technologies someone claims to know - if they can’t create an elegant and working solution to a short problem when the expected outputs are provided, then they most likely aren’t top notch.

There is also a scale to the problems, from easy to hard. Personally, botching the easy problem is unacceptable, but I’m a slightly more forgiving with the hardest problem. In general, the reviewers are usually pretty harsh, so, it generally takes a good solution to even be considered acceptable. Oh, and experience level is factored in - junior candidates have to write pretty good code, but an experienced candidate needs to nail the code submission.

It’s a HUGE differentiator. Every company that is serious about staffing the best talent (or even just good talent) should require a code submission.

I’ve even heard of some offices doing pair-programming during the interview. This is also a great idea. Someone may know how to write great code, but still not be good at actually doing it. I’ve paired with (non-Thoughtworks) people that were supposed to have 5+ years of experience and watched them struggle to write a method with a single for-loop and get it to compile. Alternatively, I’ve paired with people that are like some freakish melding of man and machine, writing with blinding speed, and what they write is beautiful. If I owned a company or ran an IT team, I’d do my best to get the freakish cyborg artist types.

Another best practice: Command query separation

August 6th, 2008

I came across some code the other day that made me wish more people were aware of the concept of Command Query Separation. It’s a great default behavior for developers to have and results in code that support more reuse.

Since I think someone out there would be upset if I posted the actual code, here is something else I made up:

  1. class SomeForm extends Form {
  2.  
  3.     // bunch of other Form suff
  4.  
  5.     void validate() {
  6.         List errors = getValidationErrors();
  7.         okButton.setEnabled(errors.size() == 0);
  8.     }
  9.  
  10. }

Looks simple enough. I’ve got a form, and when everything is valid, the button becomes enabled and the user can move on to something else. The getValidationErrors() method gets me a list of errors if the form has any error. Oh, but wait, it does more:

  1. List getValidationErrors() {
  2.     List errors = new List();
  3.     this.menu = new Menu();
  4.     if (!email.isValid()) {
  5.         errors.add(INVALID_EMAIL);
  6.         this.menu.add(new MenuItem(IMPORT_EMAIL_FROM_OTHER_ACCOUNT));
  7.     }
  8.     if (!name.isValid()) {
  9.         errors.add(INVALID_NAME);
  10.     }
  11.     if (!address.isValid()) {
  12.         errors.add(INVALID_ADDRESS);
  13.         this.menu.add(new MenuItem(LOOKUP_ADDRESS_ON_MAP));
  14.     }
  15.     return error;   
  16. }

The code’s contrived, and only loosely based on what I saw. What I saw was much more egregious. The point is that the method getValidationErrors() is building a menu and collecting validation errors.

What if I want to find out if everything is valid later on? Well, I’ll be rebuilding the menu again. It get worse when you see the code get more complex like so:

  1. if (!email.isValid()) {
  2.         errors.add(INVALID_EMAIL);
  3.         this.menu.add(new MenuItem(IMPORT_EMAIL_FROM_OTHER_ACCOUNT));
  4.     } else {
  5.         if (email.matchesBusinessRule()) {
  6.             refreshAdvertisementAboveForm();
  7.         }
  8.     }

Sigh. I just want to know if the form is valid. I don’t want the advertisement to refresh above the form. What I really want is a getValidationErrors() and a updateMenu() and a refreshAdvertisementAccordingToRules(). Alternatively, I would have accepted a getValidationErrors() and a doABunchOfUiChangesInResponseToFormValidation().

As you may have guessed, the code that I saw was not tested either.

I have a suspicion that some developers (even some experienced developers) create code like this because they feel that they are writing the least amount of code possible (you save lines by not creating more methods!). Developer awareness aside, probably the top reason this tends to happen is lack of collective code ownership. A developer has to add a feature to a part of the system that “belongs” to someone else, so they add they try to add as few lines or if-statements as possible. They see that there is logic that they need, and they see that it is executed at the time they want it to execute their code, so they pile on their logic and finish their task.

From the example I’ve shown, it may look like a small evil, but the amount of if-else statement and total coupled logic tends to build up quick.

Conway’s law

August 3rd, 2008

Conway’s law essentially states that the structure of a software system will reflect the structure of the organization that makes it.

I’ve seen this effect in small companies with little structure and who seem to be constantly “putting out fires.” This is probably the easiest one to point out, but the code base is often highly chaotic and unstructured. The opposite of that is that rigidly structured organizations have rigidly structured code.

I worked on a project once in a large, very hierarchically structured organization. Sure enough, the code base had a hierarchical structure where modules depended on other modules in a way that could almost be put into an organizational chart. Additionally, the company had divisions that interacted on a contractual basis as though they were seperate companies, and that extended to the software teams. Each team would integrate on a software “contract” basis, and it looked bad if one team “broke” that contract.

I remembered this again the other day when another developer was telling me of a manufacturing company that he worked for. He described how data was often processed from one database and stored in another, and at some point, a cron job would start a process that would pick the data up, do something with it, and store the results somewhere else for some other application. It sounded a lot like how a manufacturing plant works.

I’m sure there is a wide variety of reasons why Conway’s law happens, but I see a couple of primary reasons. One is that whatever theory of management a company implements will almost certainly extend to the software development management, as in my example of a hierarchical organization. The manufacturing example is potentially different. In the case of manufacturing software, I think that it could be the domain that results in the structure of both the management and the software. In the first case, the structure of the software was imposed upon it because it suited the management structure, and in the second case, both the management and the software are structured in order to support the business processes that need to happen.