Statistics for the masses

Enterprises these days collect such amazing amounts of data, that talking about petabytes or even exabytes of data is no longer just a domain of nuclear physicists at CERN, but is being talked about at companies' boards throughout the world. Everybody feels that these data contains lots of knowledge, but only few know how to extract it from the data. Big data startups have set themselves to address this problem.

Big data is hot, but infrastructure-level platforms such as Hadoop, which focus on storage and processing, still need help to take them into the mainstream. They need a killer app or two that will let companies analyze, visualize and act on all that data without hiring a team of Stanford Ph.Ds.


Big data is just a more sexy name for the statistics (you can't raise money from your favorite VC or get that gorgeous blonde to bed, if you introduce yourself as statistician). As is the case with statistics, the biggest problems with the big data is that you can prove with it just whatever you've set yourself to prove. Just pick the right angle, right "anomalies" to disregard, and fitting confidence level, and the proof is all yours. In my experience, while dealing with data, you should primarily listen to the data. Asking questions is just a tool, how to make data speak. And, most importantly, you should be honest with yourself. If the data don't fit your mental model, you should change your mental model not the data.

I remember the time in the mid-1980s, when Lotus 1-2-3 became popular among the managers. At that time, my father was a CFO of a middle-sized company. He was technology savvy and he learned himself how to write formulas and construct spreadsheets. Before long, he implemented all the analytics he needed himself, thus no longer being dependent on his subordinate to provide him with the data that he needed in order to make informed decisions. If the big data applications will be used by managers in a similar way, I see a very bright future for the big data. But if the "knowledge" will be delivered to the managers in the form of reports, without managers getting their hands dirty with the data, I don't see any benefits for the company's bottom line or its customers. On the contrary, I think in such a scenario, the big data will be used mostly as a heavy artillery in internal fights.