he term “Big Data” is relative and highly dependent. For example, the organizations that lack the ability to handle, store and analyze their own sets of data, are in fact experiencing the Big Data “phenomenon”. Nevertheless, this is not what Big Data is all about. Besides of being by order of magnitudes in terms of Volume, data has to be of greater Variety and complexity, and generated at a high Velocity, which are usually referred to as the three Vs of Big Data.
A better definition of Big Data might be: the processing, interpretation and representation of large volumes of data (typically, petabytes or zettabytes) originating from different sources in a way that makes the data meaningful and usable.
We can consider Big Data as a collection of collections of data. The collections (usually unstructured data such as Twitter tweets and Facebook likes) come with different formats and cannot be fitted, as we may wrongly expect, into our traditional table cells.
Two factors are driving Big Data towards being the resort for many businesses:
- The huge volume of structured and unstructured, real-time and history data that comes from different sources.
- The emergence of new and powerful technologies such as Apache Hadoop, MapReduce, NoSQL, etc. that can be used to analyze Big Data in a way that ensures a better utilization of the data (Most of these technologies are under open source license, but some vendors have also provided commercial distributions for these products)
One of the principal objectives of Big Data is to analyse data incoming from different sources to identify trends, business opportunities, customer sentiment, market shifts, potential customers, etc.
Big Data analytics has introduced great improvements in decision making for sectors like health care, economic productivity, crime, and natural disasters. For example:
a) Telecommunication companies use the large volumes of call details and logs to improve customer retention, capture and margins.
b) Financial services companies use transactions to prevent fraud, perform forensics, ensure compliance and achieve better risk understanding.
c) Consumer product companies use social network data to improve their marketing strategies.
d) Utility companies use meter data to build smart grids for obtaining information on usage, failure and theft.
But even when several companies have moved towards using Big Data technologies, it is still unclear how they are implementing Hadoop and other related technologies. However, the market is growing fast, and these technologies have settled beyond the experimental use.
Finally, and before we start using Big Data, we must answer the following three questions:
- Where should we start and what are the computation paths to discovery?
- What are the algorithms that we must use?
- How can we visualize our findings?