A blog entry from Big Data Track Chair Jeremy Edberg.
We’re introducinga new program at the upcoming Cloud Connect Chicago on ”BIG DATA: FUNDAMENTALS TO BEST PRACTICES“. BUT what is Big Data? I get asked that question often. Just the other day, I did a search on Google News for [Big Data]. What I found was a long list of blogs and news stories, many of which were trying to define Big Data.
I’ll make the answer easy — we don’t have a definition because it means too many things to too many people. To some it means using the latest NoSql technology; to others Big Data is just a repackaging of data mining and business intelligence.
Instead of trying to define Big Data, let’s talk about what data can do for you. Previously I’ve mentioned that it isn’t the concept of using large datasets that is new. What is new is having cheap and easy access to hundreds or thousands of machines and massive storage with just a simple API call, allowing for the analysis of much more data at one time than ever before.
One of my favorite examples is that of Google’s flu trends, which uses search data to find flu outbreaks worldwide using hotspots of search terms.
That is just the beginning though. For the idly curious, there are a host of public data sets to mine for juicy new findings, such as the data sets [Amazon makes available for free] to their EC2 customers. That includes census data, genome data as well as a general knowledge base.
Then there are websites that utilize user content, which more and more sites are doing each day. User data is a treasure trove of useful unstructured data. Sites like reddit and Twitter can even use their data to predict news cycles and figure out what will soon be popular.
That is not all though. If you’re a business, you can’t afford not to be collecting and working with data. Various studies have shown that data driven business — ones that base their strategic decisions primarily on data — are anywhere from 10% to 40% more efficient than their competitors as far as revenue per dollar spent. That is a huge variance in study results, but the trend is all in the same direction.
Businesses have been collecting what they call Business Intelligence for a long time, but with the resources available now, they can collect far more data and process it much faster. For example, a website can track their users through an entire visit from beginning to end, which page they hit and how long the visitor is there. With some simple code, the site can even track where the user’s mouse is on the screen throughout that whole session. They can then analyze that data for a single user to make personal recommendations, or they can analyze the data in aggregate and make user interface decisions.
Shopping is another place where businesses with more data excel. Amazon is the classic example, processing every signal they get to make recommendations to you. eBay uses its vast amount of transaction data (user A originally searched for this but then bought this) to return better search results, by returning what you are most likely to buy instead of what might necessarily be the closest match to what you searched for.
It isn’t only online that shopping is improved by data however. When you go to Target or the supermarket, they will print coupons on the back of your receipt for your next visit. Those aren’t just random items — they are targeted to you though your club rewards program or the credit card you just used to pay with.
So what is Big Data? Honestly, I don’t think we’ll ever have a good definition of what it is. Instead what we’ll have is a set of examples of how people are using data that we can all learn from.
Do you have your own great example? Add them in the comments!