Wednesday, November 20, 2013

Software is getting better!

A friend made a joke about May's Law today. David May states his law as follows, “Software efficiency halves every 18 months, compensating for Moore’s Law.” Larry Page has made similar statements, of course referred to as Page's Law. Sure, it's profound, funny. and good marketing to attempt to make assertions about software always seeming to run slower. But the truth of software efficiency writ large is the opposite. Here's a report that addresses the assertion directly.

REPORT TO THE PRESIDENT AND CONGRESS DESIGNING A DIGITAL FUTURE: FEDERALLY FUNDED RESEARCH AND DEVELOPMENT IN NETWORKING AND INFORMATION TECHNOLOGY
http://www.whitehouse.gov/sites/default/files/microsites/ostp/pcast-nitrd-report-2010.pdf

Kurzweil's book "How to create a mind" tipped me off to this report and specifically this section on page 71.
Progress in Algorithms Beats Moore’s Law
Everyone knows Moore’s Law – a prediction made in 1965 by Intel co-founder Gordon Moore that the density of transistors in integrated circuits would continue to double every 1 to 2 years. Fewer people appreciate the extraordinary innovation that is needed to translate increased transistor density into improved system performance. This effort requires new approaches to integrated circuit design, and new supporting design tools, that allow the design of integrated circuits with hundreds of millions or even billions of transistors, compared to the tens of thousands that were the norm 30 years ago. It requires new processor architectures that take advantage of these transistors, and new system architectures that take advantage of these processors. It requires new approaches for the system software, programming languages, and applications that run on top of this hardware. All of this is the work of computer scientists and computer engineers.Even more remarkable – and even less widely understood – is that in many areas, performance gains due to improvements in algorithms have vastly exceeded even the dramatic performance gains due to increased processor speed.The algorithms that we use today for speech recognition, for natural language translation, for chess playing, for logistics planning, have evolved remarkably in the past decade. It’s difficult to quantify the improvement, though, because it is as much in the realm of quality as of execution time.In the field of numerical algorithms, however, the improvement can be quantified. Here is just one example, provided by Professor Martin Grötschel of Konrad-Zuse-Zentrum für Informationstechnik Berlin. Grötschel, an expert in optimization, observes that a benchmark production planning model solved using linear programming would have taken 82 years to solve in 1988, using the computers and the linear programming algorithms of the day. Fifteen years later – in 2003 – this same model could be solved in roughly 1 minute, an improvement by a factor of roughly 43 million. Of this, a factor of roughly 1,000 was due to increased processor speed, whereas a factor of roughly 43,000 was due to improvements in algorithms! Grötschel also cites an algorithmic improvement of roughly 30,000 for mixed integer programming between 1991 and 2008.The design and analysis of algorithms, and the study of the inherent computational complexity of problems, are fundamental subfields of computer science.
The report goes into more detail discussing research priorities. Those in an argumentative mood might point out that algorithms are just a subset of software and overall software efficiency has been decreasing. I've made this point before when writing and talking about the big data technologies. We've certainly seen a change in the way data technologies are developed. We've gone from cleverly conserving computing resources to squandering them creatively. But it's not as if we're doing the same thing but just less efficiently; whole new capabilities have been opened up. The data redundancy of hadoop file system (HDFS) means we can process larger sets of data and overcome hardware failures which are inevitable on large "compute hour" jobs. When you're employing thousands of disks or cores in your jobs, the chances of an individual failure are increased. The inefficiency is a risk mitigation strategy, storing the same data three times (by default) certainly isn't efficient, but it makes it possible to do very large jobs.

The data processing technology improvements are just one example; there are many like this across the board. Remember XML? Json is definitely more efficient. Machine learning implementations are getting faster. As a counter to the improvements, more people are attempting to misuse your personal resources, which may lead to things seeming sluggish if you're incautious. If you're wondering why your Windows operating system seems slower, that's a whole different story. I think that in closed code bases there might be more of an incentive to hang on to old things leading to inefficiency; but that's just conjecture based on personal experience, read more about my experiences on data munge. Maybe somebody will do a study on closed vs open execution speed over time. Until then, I'll hold suspect any piece of software where a large community of developers can't look at the code base and improve it. There's a law for that too.