By Sunil Gupta
Google Flu Trends article of November 2008 heralded in a new age for big data where where it is possible to leverage the vast amount of data to speak for itself, without theory or expert knowledge of the subject matter. However, in a short span, the pendulum swung from big data being a big deal to becoming a big hype. As is usually the case, the truth is somewhere between these two extremes, this article highlights the potential impact of big data on business practices in three broad areas: applications, methods and infrastructure.
In November 2008, researchers at Google published an article in the journal Nature about Google Flu Trends (GFT) – a model that used hundreds of billions of US consumer searches on Google about influenza during the years 2003-2008 to predict the incidence of the flu. Google scientists did not use any prior knowledge of influenza; instead they analysed more than 50 million of the most commonly used search queries to automatically select the best-fitting search terms by estimating 450 million different models. The final model, which used a mere 45 search terms, was used to predict rate of flu in several US regions and these results were compared with the actual incidence of influenza as reported by the Centers for Disease Control (CDC). The paper reported an incredible accuracy rate with correlations between actual and predicted influenza rates between 0.90 and 0.97.1
While proponents of big data had been around for many years before the Google study, this article soon became the poster child for the promise of big data. It came on the heels of a provocative article by Chris Anderson, editor of Wired magazine, who argued that big data heralds a new age where theory and expert knowledge of a subject matter are irrelevant. Google scientists were neither experts in influenza nor did they build any sophisticated theory about the reasons for the propagation of the disease. Instead they leveraged the vast amount of data to “speak” for itself. This emboldened the champions of big data, and soon the popular press and technology blogs were filled with the visions of a new era of big data.
An article in the March 2014 issue of the journal Science sent a serious blow to this hype by reporting that, since August 2011, GFT had overestimated the rate of flu for 100 of the last 108 weeks.2 In some weeks, GFT predictions were almost double the actual incidence of flu. This article opened the floodgates of criticisms about big data and the hype surrounding it. Articles with titles such as “Big data: are we making a big mistake?”3 or “Eight (no, Nine!) Problems with Big Data,”4 now started showing up everywhere.