Code Junkie (Satter Ramblings)

One of the things that I have been thinking about for many years is how software is built. Great leaps in productivity have been achieved in the workplace via

Focus on metrics
Understanding of processes
Computerization and automation of tasks and processes

One of the things that has been interesting is watching the three items interplay and reinforce each other. What do I mean? If you look at computers in the enterprise, it is a story of the continual interplay of these three drivers. The reasons that computers where brought into an enterprise was to capture information especially around accounting, payroll, and the other data intensive activities. The accountants realized early on that computers could, and did, lead to better record keeping. The hallmark was being able to present management more timely financial information and moving from quarterly to monthly and now even real-time information about revenue and costs.

Auto_proc_cycle

I would assert, that one of the things that has to occur as you automate a task is understanding what composes the steps of that task. So as computers moved out and spread throughout an organization, it was only natural that organizations understood their processes better. This in turn lead to more data. Which led to refining what to measure and manage as managers gain more understanding of the process and organization. This in turns leads to a better understanding and refinement to the processes.

I can not prove this but my intuition says this is as good a argument as any for the gains in productivity. It also follows a lot of history of computers for example, why until recently has the IT department report to the Treasurer.

One thing bugs me. Why productivity has skyrocketing everywhere except one area, software development? Software productivity has remained pretty static. Several groups track project failure with the most famous being the Standish Group's Chaos Reports. The percentage of projects that failed or had some significant issues in 1994 was 84% and in 2004 it was 71%, a change of 6% per year. Not bad until you find out that a majority of the change came in a two year period from 1994 – 1996. So the real rate of return in the last eight years is less than 1%! I do not want to down play the complexity of software however if our business leaders failed this much the economy would be much worse off.

Well enough history. What is my point? My point is that the software industry does not do a very good job at creating software. Whether you are a Linux, Mac, Unix or Windows developer is immaterial. We suck and we need to admit it before we can began to move forward. Put another way if buildings where built as poorly as the software industry builds software the major cities of the Earth would have buildings collapsing all the time. Clearly the world's skyline reflects a much different story.

Still unconvinced? Ok, lets look at another fact. In 2002 a study by National Association of Software & Service Companies, NASSCOM, found in a detailed analysis of 249 mid-sized ISVs with revenues less than US$100 million the following:

average size of the company was US$ 39 million
net margin was -26%
companies jointly had revenues of US$9.7 billion
they spent US$12.2 billion losing US$2.5 billion

They would have been better off investing the $12.2 billion in the stock marketing. At the end of the year they would have had around US$13 billion or a gain of around US$800 million. Basically if you are thinking about investing your hard earn money in a software startup don't; you will just lose 20% of your money every year. You would be better off investing in a savings account or government back bonds. So if you had a $100k to invest in a software startup the odds are pretty good you would have lost 60-100% of your investment. While investing in bonds you would have made $17k at a paltry 4% tax-free rate (assumes US T-Bills).

Now, you might say oh no, Open Source is different. Nope, it is actually worse - just go look on Source Forge for all the projects started and no longer active much less even having out a general release. Ok so lets knock those projects out as someone's pet project. You still have some pretty impressive failures just look at how long it took to write Mozilla and that was with an existing codebase to use as a foundation.

If you are still not convinced, well, good luck. For those who are convinced maybe we should start looking at what software research has shown for years even decades. Randomly pickup a software development book from any year or decade and you would probably see the main reasons for software issues is some form of

One-off development and little to no re-use
Monolithic systems that grow in complexity and isolation
Low level abstraction is any or as I like to say huge NIH (not invented here)
Process immaturity and I would add people immaturity as a lot of developers whine about how process slows them down etc when in fact a lot of evidence shows that to not be true

One idea would be to put a process in place and follow it religiously (basically no exceptions allowed no matter what for at least 3-5 years). A good example is the stock market. It is a well known fact that the market has grown on average 10% per year. I have read books in investing that basically says that almost any system if applied consistently over time will lead to better than average returns. Rather than jumping from one system to another you need to follow the system for several years. One book in particular shows that the system if applied over 1, 2 and 3 years, the person who uses the system for 3 years will have a 90% chance of superior returns.

Yet most people do not have the discipline to do that. In fact, most developers would say that technology changes too fast to follow a single process for three years. And it is true that technology changes rapidly. I would assert that the development process does not and in fact you could use the same process regardless of the language, tools or technology being developed.

So now you are saying, Rabi, you are off your rocker. Why would I want a process and just for the sake of process. Well go back and look at my first three points. You need to measure, understand and automate. So if you do not have something to measure you cannot improve it. If you can't measure something you do not understand it. And if you do not understand it you can't automate it.

I would assert if you get your development group to begin to put metrics in place and measure them over a couple of years you will begin to see how to improve the process. At that point, you can begin to improve the process now that you have an understanding of the process. Then measure again for years and see if your metrics improve.

So what should you measure?

Project success and failure. Use the Standish Report as a guide. I would say that you should not allow for grey area either. The project was completed on time, budget and feature complete or it was not?
Cost of the project including labor, overhead, etc.
Amount of time spent on the project including everyone. You should measure how long you needed subject matter experts, infrastructure support and not just development staff i.e. developers and project leads/managers.
Measure quality as a metric of report defects in all output this includes documentation. I would include what stage a defect is caught, what stage it is address, and even the time and cost to fix the defect. For example if you have two products which has the higher quality? The one with two bugs that took several weeks to find and fix or the one with 20 bugs that took a couple of days to find and fixed. I would assert the second project has higher quality.

Metrics to avoid

how many lines of code (LOC) are produced,
defects per line of code.

While both of these have been used in the past. I believe they will actually have the worse developer showing up as better than your most productivity developers. Why? Well LOC measure quantity without taking in quality. Who is the better developer the one who could win the C obfuscation competition i.e. wrote the most convoluted solution or the developer who solved the problem with a very short and concise program?

This is the key, you need to measure. And just like performance measurements you have to have enough data points to get a meaningful idea of where the bottlenecks exist.

What do you think?