Businesses have recognized the benefit rooted in data and, therefore, put together people, technologies, and processes to collect, store and analyze huge amounts of data. A key element for deriving value from data is by using analytics.
Big Data analytics is a business strategy that uses technology to gain deeper insights into customers, partners and businesses and so achieving competitive advantage. It includes working with huge data that, because of its size and variety, lie beyond the ability of typical database management systems to store, manage and analyze.
The use of “Big Data analytics” involves two dimensions: one is Big Data, which is annotated with the three ‘Vs’: data volume, velocity, and variety. The other dimension is analytics, which refers to the ability to gain insights from data and so making decisions by applying analytical methods from mathematics, statistics, data mining, machine learning, visualization, etc.
Because datasets are often too large to view or analyze using traditional reporting and mining tools, visualization became an integral part for making analytics results useful. This increases the necessity for providing more advanced visualization capabilities using, for example, R, Neo4j, D3j, or Tableau.
While “data analysis” has some similarity to “data analytics”, the earlier assumes that people working in analysis know what they want, where to look and how to find the answer. They have more knowledge about the data and how to process it. People working in data “analytics”, one the other hand, often do not know what they are looking for, where they should look or how to search. It is an exploration process that searches for relationships in non-specific data, often for qualitative results (in contrast to data analysis).
Nature of data (structured, semi-structured and unstructured) and analytical requirements force Big Data solutions to be carried out on a spectrum of technology platforms ranging from the traditional relational databases to Hadoop framework and NoSQL data stores (e.g. MongoDB and CouchDB). A typical business plan may require a solution that combines all these apparently fragmented technologies. Let’s consider an integrated infrastructure solution as in the figure below (source: Big Data Analytics: Concepts, Technologies, and Applications)
Apache Hadoop is recognized as the framework most suitable for building data-intense applications. The framework is based on a programming model developed by Google called MapReduced to maintain huge data. However, it is not expected to take the place of relational databases or data warehouse, although, at the same time, organizations are required to update their systems to suit the new sources of data. Nevertheless, this framework and similar platforms have clear shortcomings that limit their performance. This led to the advent of new frameworks such as Spark, Yarn, Storm, S4 and Pregel to handle different processing requirements.
Big Data technologies fall into two groups: one group implements batch processing (performing analytics on data at rest, where data, in this case, does not always need to stay in memory. Alternatively, data resides in databases or data warehouses) and the other implements stream processing (performing analytics on data-in-motion). Sensors, satellites, the Internet, and broadcasts are main sources for streaming data. Yet, in both cases, large memory and high bandwidth input/output are required, even though the processes themselves are not computationally intensive, particularly when parallel techniques are incorporated.
Why analyze Big Data? (i) Governments can greatly make benefit of Big Data analytics to improve existing processes and operations at different occasions, and develop entirely new types of analyses. For example, investigating user records, text documents, audio, video, images provide holistic views of citizens, their activities, and relations. Analytics in this way provides immediate access to data, rapid delivery of information (including insights), and immediate retrieval of streaming data. (ii) Security professionals can also make use of analytics to analyze financial transactions, log files and network traffic to find anomalies and suspicious activities. (iii) For companies, the ability to analyze entire datasets rather than subsets, every interaction rather than every transaction, and multi-structured data can produce additional insights and so reveal additional opportunities never existed before. To them, analyzing streaming data can improve responsiveness and reduce risk.
This is not everything! Big Data analytics will not only help businesses, governments, and security sectors; (iv) collecting, storing and analyzing data related to motion patterns from smartphones can help in the planning process of new roads, electricity networks, railroads and public transport. (v) The health sector can use the analytics results to predict and control the spread of diseases through matching the terms used on the Internet with these found in health reports.
From the business intelligence (business intelligence or BI is a broad definition that covers applications, technologies, and processes important for gathering and analyzing data to support decision making) point of view, the success of Big Data analytics solutions requires a number of conditions. This includes a clear business need, organizational culture, harmony with the business strategy, a decision-making process that is based on facts, a strong data infrastructure, use of appropriate analysis tools, and skilled people (mastering analytical and modelling skills is critical for data scientists in order to understand data and reveal hidden relationships.)
Many analytics techniques used in Big Data analytics (e.g. machine learning, simulation, regression) have already been used in real-life applications years ago. What is new is the recent advances in computer technologies particularly these related to CPUs, data storage, data centers, cloud computing, and distributed computing systems (e.g. Hadoop framework that allows users to store restfully large amounts of data). These advances are attributed for the superiority of Big Data analytics over traditional analytics. They can be added to the new business opportunities and the new sources of data, in particular, social media. Collectively, they led to the birth of a new discipline of techniques, tools, and technologies that make use of Big Data called “data science.”
It is useful here to distinguish between three levels of analysis: Descriptive analytics. It is the most common and most well-understood type of analytics and usually referred to as the simplest one. In this type, historical data is used to understand and analyze past business performance. Diagnostic analytics. This type looks at past performance and tries to understand what happened and why happened. Predictive analytics: this type of analytics can help making predictions about future events by finding trends and behavioral patterns located in data. Prescriptive analytics: It is a complementary step to predictive analytics because it plots the best courses of action in response to what predictive analytics tells about the future.
Still there are some inhibitors to analyze data that is big. This includes the shortage of skilled people, confusion about what technology to use, the large investment required to use analytics, privacy and security issues, and lack of business case for analytics.