Wednesday, May 15, 2013

Secret sauce algorithms

I think too often we take algorithmic success at the face value of the claims being made. Algorithms are discussed as though they should be kept secret to protect them, but in my opinion it's at the very least irresponsible and in some rarer circumstances intentionally misleading and unethical to keep algorithms locked up in secret. The importance of not being different was an essay written by Bruce Schneier in 1999.  That's 14 years ago now, but I still find myself referencing the points he made when talking about data systems and algorithms; and not just cryptography algorithms. Imagine that we executed the same level of diligence to medicine that we do to our information systems and missions. Suppose your doctor said: "I realize antibiotics are proven through decades of research to cure your condition with no harmful side effects.  But, I've developed my own better cure that I've only tested on a few of my patients and if you pay me I'll give it to you." Would you take the pill? If you wouldn't bet your life on it why would you bet your organization's mission on it?

Algorithms, like medical research, need peer review and open and objective study before the claims can be validated and the results trusted. Don Knuth's series of books titled "The Art of Computer Programming" started chronicling programming algorithms and their history and analysis in 1962. The latest volume was published in 2011 and there are more on the way. When fishing for the right way to approach a data problem, they are the most comprehensive resource I know of to find good algorithms. If a method is too new to be included in a resource like TOCP, you'll probably find it being addressed in research papers or on blogs (like this one).

Once you know the name of the approach needed, a quick google search will almost always turn up implementations of the needed algorithm in whatever programming language you need. Thanks to open source software and people sharing solutions online, the availability of good solutions has never been better. All indicators seem to point to this trend continuing. What does that say about the future of software to you? To me it speaks to a more capable future where real value is shared freely and not locked up.

The next time somebody approaches you and claims their software does X better because of their "secret sauce" algorithm; be as skeptical as though they were a doctor offering you a "secret ingredient" pill.  Actually, there's a time tested term for fraudulent health products: "snake oil". Maybe it's time for an algorithm specific version of that term. If you spend money on closed source software, buy it because the interface works and the implementation is good; not because the proprietary algorithms are advertised as "better".