Thoughts on Software Development

Friday, December 24, 2004

Grady Booch has fired another attacking missile in the great debate over software factories, and the idea's defenders have replied. Microsoft's view of the world is outlined in a series of articles by Jack Greenfield on the MSDN site:
http://msdn.microsoft.com/architecture/overview/softwarefactories/default.aspx?pull=/library/en-us/dnmaj/html/aj3softfac.asp http://msdn.microsoft.com/architecture/overview/softwarefactories/default.aspx?pull=/library/en-us/dnbda/html/softwarefactwo.asp http://msdn.microsoft.com/architecture/overview/softwarefactories/default.aspx?pull=/library/en-us/dnbda/html/softfact3.asp

The basic idea behind software factories is to move the production of software from a craft to an industry resembling manufacturing. This is not a new idea. In the second article, after reviewing why these efforts have failed in the past, Greenfield says:

"We are unable to achieve commercially significant levels of reuse beyond platform technology. The primary cause of this problem is that we develop most software products as individuals in isolation from other software products. We treat every software product as unique, although most are more similar to others than they are different from them. Return on investment in software development would be far higher if multiple versions or multiple products were taken into account during software product planning. Consequently, we rarely make commercially significant investments in identifying, harvesting, packaging and distributing reusable assets. The reuse that does occur is ad hoc, rather than systematic. Given the low probably that a component can be reused in a context other than the one for which it was designed, ad hoc reuse is almost an oxymoron. Reuse rarely occurs unless it is explicitly planned in advance."

I agree as far as he goes, but I do not think he fully comes to grips with why the idea of software factories has a long way to go. No doubt part of the reason is that the idea of a software factory resembles the idea of artificial intelligence. Every success redefines the goal. Nonetheless, I think there is a more fundamental reason.

Programming has always been a labor intensive activity. As a result, from the very start people have tried to figure out how to automate as much of the process as possible. Compilers were one of the very first attempts to automate software development. Today we take them for granted, but well into the 1970s people were still arguing over whether a good human assembly language programmer could code better than a compiler. Debuggers, linkers, loaders, file systems, operating systems, distributed transaction coordinators, were all invented to automate parts of the software development. How long did it take for the idea of a virtual machine (as in Java or .NET) to become practical for most software development?

You can, as Greenfield does, view these artifacts as improved abstractions. Abstractions are very critical to software development. I view these developments differently. I see them as automating what we understand how to automate. Code libraries such as for .NET and Java are in the same category. After years of experience we now understand enough of some of the critical elements of certain parts of software development to encapsulate them in libraries.

But software is not like other engineering pursuits such as bridge building. Most bridges, although they look different, are really one a few basic kinds. Because you cannot copy a bridge like you can copy a program, you need to build a new bridge at every place you need to cross a river. Hence you can much more easily replicate what you did before, or learn from experience.

This really came became clear to me when I was a graduate student in nuclear engineering. In the reactor design course final exam we were asked to design a cooling system for a nuclear reactor. We were not, however, to use our fundamental understanding of physics and engineering to do this. We were to apply the American Society of Mechanical Engineering (AMSE) standards for cooling systems to do the design.

Why has this been so difficult to do with software? Since software is easy to copy, you only need to create a new piece of software when you need to do something new. Automation is about understanding what you have done in the past. You cannot automate what you do not understand. So much of software development is done for things we have not done before. We do not know, or do not fully understand the domain models that Greenfield relies on for the idea of software factories. This is why the CASE tools, or the code generators of the past have not provided any reduction in cost and time. This is why I think UML based code generators will not be wildly successful either.

Of course you need to understand the domain models. It is just that in a dynamic economic environment, you do not arrive at the knowledge in time to automate the process until it becomes yesterday's understanding. As yesterday's understanding it will take a while to see if it is fundamental enough to be worth automating.

So long as software is about innovation, or doing what we do not yet understand how to do, it will always have a large craft component. Maybe automation will decrease the need for programmers, and thus reduce the labor cost. So far this has not happened in 60 years. People may use cheaper programmers, but that is another story for another time.

12/24/2004 11:13:00 AM (Eastern Standard Time, UTC-05:00) | Comments [2] | All | Software Development

Thursday, December 16, 2004

Abstraction and Simplicity

Adam Bosworth has given a talk (discussed in his blog entry) that has received a lot of attention and comment. He argues that software programs and their tools are way too complex and should be simple.

The problem I have with his argument (and arguments similar to that) is that it posits a false binary choice: either be complex or simple. Complexity is a continuum. Bosworth argues against sophisticated abstractions. But it is sophisticated abstractions that make simplicity possible.

After all, the computer is just atomic particles. Does any programmer worry about that? Or the gotos/branches that are all over the microcode? What about the instruction pipeline? That is all abstracted away in the "hardware". How many programmers worry about exactly how the operating system scheduler works? The whole idea behind class libraries that come with Java and .NET is to allow the programmer to concentrate on the business logic and not worry about the "plumbing code".

Occasionally we have to break through that abstraction and worry about exactly how things work. I discovered that when I wrote my first test code to test the performance of the first MIPS machines back in the 1980s. I found that if I did not return a value from my test routine, the loops would be optimized out. Most of the time we can remain blissfully ignorant of the abstractions. Performance, scalability, and most important of all security, are problems that are classic examples of where we often have to worry about complexity and look at the abstractions. The solutions to those problems are sometimes simple, but more often than not messy.

You cannot divorce simplicity from abstraction. People dealing with complicated things need complicated abstractions. Engineers often make products and technologies that are too geeky, but sometimes things are too simple. After all, the Swiss Army knife comes in several sizes. You can match the level of simplicity that you need.

The Swiss Army knife analogy strikes at the heart of the issue for me. You need to keep it simple enough. Saint-Exupery's famous saying applies here. Perfection is achieved in design when there is nothing more to take away, not when you have nothing more to add. In other words you have to keep it simple, but it still has to accomplish the task. The issue is to make it simple enough for your user, whether they be a writer of a blog, or a user of a class library. But even the simple user to be effective has to understand the limits of the tool, or to be more sophisticated, the abstractions and assumptions used. This applies to all sophisticated problems whether they be the accuracy of a medical test, the stability of Social Security, or the usefulness of Atom or RSS.

Bosworth speaks about the virtue of "keeping it simple and sloppy and its effect on computing on the internet." Well if you have to be HIPPA compliant you cannot be sloppy and forgiving of human foibles and weaknesses. Human weaknesses and foibles are precisely the problem, and they cannot be abstracted or assumed away to achieve simplicity. If you do so, you will have a system so rigid, so bureaucratic, it would be unusable.

Bosworth concludes by talking about achieving simplicity in the information search space to avoid information overload. He talks about data mining, and machine learning as the potential solutions. But they all rely on abstractions about what is important, and what is not. Users better understand how they work. I cannot wait for the day when the social scientists start deconstructing data mining and machine learning for their social assumptions. At that point both humans and machines will prove once again what Hobbes argued so many years ago. Knowledge and the assumptions that go with it are the product of human actions. Knowledge is partly determined by our social relationships and what we assume. Simplicity results from assumptions and abstractions. But we cannot hide from the mess in the name of simplicity.

12/16/2004 10:52:47 AM (Eastern Standard Time, UTC-05:00) | Comments [0] | All | Software Development

Subscribe:

newtelligence dasBlog 1.8.5223.1