Today, there are more than 4.6 billion mobile-phone subscribers; more than 2.4 billion people with access to the Internet; and more than a billion Facebook subscribers. All of them are producing large amount of data.
It was estimated that the amount of data produced from the dawn of civilization to 2003 is 5 exabytes, at a time that every two days, we produce the same volume of data. It is even expected that by this year, the volume of digital universe of data will reach 8 zettabytes. This flood of data, which is commonly referred to as Big Data information overload or data deluge has become a challenge for many businesses.
The three Vs: volume, variety and velocity characterize Big Data and what it is all about. They add more challenge to data analysis, data integration, search, information discovery, system maintenance, etc. and should be addressed in parallel.
- Volume: Data is usually multiple petabytes of size, which means that data massiveness and growth outpace traditional storage mechanisms. Even though we are experiencing a revolution in storage capacity and cost, one issue is emerging: how can we extract relationships from large volumes of data?
- Variety: Traditional data processing methods cannot cope with Big Data which is heterogeneous in content such as images, video, text, sensor data, logs, meter-collected data, financial transactions, etc. In fact, Big Data is meant to deal with data that has not been mined for the sake of deep analysis.
- Velocity: Big Data is generated with high velocity and requires to be processed correspondingly to meet demand. For example, RFID and smart metering are two sources that produce increasing torrent of data that needs to be analysed in seconds (or milliseconds).
Given the three Vs of Big Data, the challenge which large and medium-sized companies are facing is how to extract the potential of Big Data and use it productively in running the business?
In addition to the three characteristics of Big Data, we can as well add a fourth dimension:
- Variability: data (like in stock markets) is constantly changing, which means it can go through periodic daily, seasonal or event-triggered peaks. This adds a new challenge in terms of data management and manipulation.