There’s a picture circulating the web (too irreverent to share here) that expresses a common exhaustion at the proliferation of the term “big data.” But the problem is not how often the term is used. It’s that many people still have no idea what it really means. As a result, the term has become a catchall for any sort of digital information.
I was inspired to write this post by the Big Data and Privacy report released on May 1st by the President’s Council of Advisors on Science and Technology (PCAST). The report spends most of its 57 pages talking about the privacy implications that big data presents. Which is important, because the implications are huge and have not yet been adequately addressed. Just think: the most recent governing data privacy law passed Congress in 1986.
But this post is not about privacy. Here, I want to get back to some basics on big data. First, what it is. Then, where it lives — and why where big data lives matters.
Big Data: What It Is
In 2001, Gartner analyst Doug Laney described the digital information we now call big data as characterized by Volume, Velocity, and Variety. The 3 Vs are now commonly accepted characteristics for describing big data:
- Volume — Literally, the amount of digital information being generated. Previously unthinkable volumes of digital information are generated every second. And with storage costs continuing to decrease, storing all the data is not a problem. Now the challenge is ﬁguring out what to do with it — how to glean value from the data.
- Velocity — In addition to the huge amount of digital information that’s being generated, it’s being generated fast — a lot of it, in real time. Part of the power of big data analytics, then, lies in being able to gain insight real time rather than days, months, or years after the data has been gathered.
- Variety — Big data sets contain both structured and unstructured information. Data in the traditional sense of numbers neatly sorted in rows and columns as well as text data that is not in a pre-deﬁned rows-and-columns format. SAS explains, “Managing, merging and governing diﬀerent varieties of data is something many organizations still grapple with.”
Examples of big data include digital information generated through:
- Transactions — For example, Wal-Mart handles more than 1,000,000 customer transactions every hour, feeding databases estimated at more than 2.5 petabytes
- Social media — Facebook alone processes 500 terabytes of data every day, for example
- Sensors — A Boeing jet, for example, generates 10 terabytes of information per engine every 30 minutes of ﬂight
The ultimate diﬀerentiator between big data and not-big data is that the volume, velocity, and variety associated with big data sets make them very diﬃcult or impossible to manage with traditional data tools. You can’t put big data in Excel.
Big Data: Where It Lives
Granted, the PCAST Big Data and Privacy report was designed to put forth recommendations for protecting individuals’ privacy in the era of big data. But out of 57 pages, only 1 covered a critically important element in the big data conversation. That is, where big data lives.
In the data center.
The ability to gather, analyze, and beneﬁt from the treasure trove of data that’s generated every second depends on the digital infrastructure in which that data lives. And the systems that support the digital infrastructure. Big data depends on the data center. The cables that bring data into and out of the data center. The servers on which the applications that analyze data run. The cooling systems, and backups, that keep the air around the servers at optimal operating temperature. The power systems, and backups, that keep the cooling systems running, and keep the power ﬂowing to the servers. The security systems, and backups — physical and logical — that keep the servers secure.
The insights that can potentially be gleaned from big data do indeed have tremendous potential to make our world a better, cleaner, safer place. Certainly there are issues — like who owns the data generated by an individual’s activities — that need to be reconciled. But in our conversations about big data, let’s all be cognizant (and yes, we’re biased) of the fact that big data doesn’t exist in the ether. It exists in — and depends on — the data center.