The Big Data Landscape

6835100-landscape

In a previous post, we addressed Hadoop ecosystem and a set of tools that reside and operate near the two core components of Hadoop (i.e. MapReduce and HDFS) to help them store and manage data, and perform various analytic tasks. However, Big Data landscape is more than Hadoop alone.

In this post, we will expand the circle a little bit and address the many technologies that are involved in Big Data processes. The Big Data landscape can be daunting. The vast proliferation of technologies in this competitive market means there’s no single go-to solution. However, it is possible to group the different tools and frameworks based on similarity in goal and functionality into a number of main components:

big data ecosystem

  1. Distributed file systems: file systems that run on multiple servers and allow access to files from multiple hosts, which means the ability to share files and storage resources by multiple users.
Click to read the full post

What Does Big Data Analytics Need from ICT to Develop?

Big Data is the data that, in addition to being massive in size, is of a greater variety and complexity, and is generated at a high velocity. Collectively, these are referred to as the three Vs of Big Data. However, the concept is relative and highly dependent: the organizations that lack the ability to handle, store or analyze their own sets of data are in fact experiencing the Big Data phenomenon.

Big Data analytics, on the other hand, is a business strategy that uses technology to gain deeper insights into customers, partners, and businesses, and hence achieving competitive advantage. It involves working with data that, because of its size and variety, lie beyond the ability of typical database management systems to store, manage and analyze.… Click to read the full post

Big Data: Types of Data Used in Analytics

Data types involved in Big Data analytics are many: structured, unstructured, geographic, real-time media, natural language, time series, event, network and linked. It is necessary here to distinguish between human-generated data and device-generated data since human data is often less trustworthy, noisy and unclean.

A brief description of each type is given below.

data types

Types of data used in big data analytics

 

  1. Structured data: data stored in rows and columns, mostly numerical, where the meaning of each data item is defined. This type of data constitutes about 10% of the today’s total data and is accessible through database management systems.
Click to read the full post

Big Data Analytics: A Closer Look

Businesses have recognized the benefit rooted in data and, therefore, put together people, technologies, and processes to collect, store and analyze huge amounts of data. A key element for deriving value from data is by using analytics.

Big Data analytics is a business strategy that uses technology to gain deeper insights into customers, partners and businesses and so achieving competitive advantage. It includes working with huge data that, because of its size and variety, lie beyond the ability of typical database management systems to store, manage and analyze.

The use of “Big Data analytics” involves two dimensions: one is Big Data, which is annotated with the three ‘Vs’: data volume, velocity, and variety.… Click to read the full post

Four Flavours of Business Analytics

Image source: http://www.iskonsystems.com/Solutions-Business-Analytics.php

Data analytics, as defined in “Competing on Analytics: The New Science of Winning” by Thomas H. Davenport and Jeanne G. Harris, refers to the extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to derive decisions and actions. It has the meaning of discovery and communication of meaningful patterns in data.

Dealing with business analytics implies the efficient use of quantitative analysis, statistics, as well as information modelling to shape business decisions. In this context, people dealing with business analytics can be classified into three levels: analytics scientists who build complex models to extract insights from data, analytics experts who apply the models from the first level to real business problems, and analytics specialists who can build insights based on the output of the previous steps.… Click to read the full post

Big Data as Part of Internet of Things Solutions

Internet Of Things

The Internet of Things or IoT is basically a complex network that seamlessly connects people and things together through the Internet. Theoretically, anything that can be connected (smart watches, cars, homes, thermostats, vending machines, servers…) will be connected in the near future using sensors and RFID tags. This allows connected objects to continuously send data over the Web and from anywhere. The first time the term was used was in 1999 by Kevin Ashton, the creator of the RFID standard.

Internet Of Things

Source: http://inoviagroup.se

IoT will have the advantage of bringing us smart cities with smart cars, secure and efficient buildings, and smart traffic management systems.… Click to read the full post

The Future is for NoSQL Data Storage model!

NOSQL

Databases come in a variety of tastes, such as relational (e.g. Postgres, Oracle and MySQL), document-oriented (eg. MongoDB, CouchDB and SimpleDB), columnar (e.g. BigTable and HBase), key-value (e.g. MemcacheDB, Redis and Riak) XML (e.g. MarkLogic, BaseX and eXist) and graph (e.g. Neo4J, GraphDB and Giraph). All data stores support writing and retrieving data but with some differences in terms of database indexing, database schema, query format, data sharding, replication, scalability and others.

Although the relational model and the Structured Query Language (SQL) were for decades the de facto for storing data, it has become established that relational databases are no more the winners when it comes to flexibility and scalability.… Click to read the full post

Hadoop Ecosystem: an Integrated Environment for Big Data

Hadoop_Ecosystem

Hadoop is currently the most common single Big Data platform. However, still other techniques play a role in the scene. While there are proprietary distributions for Hadoop which are developed by giant Big Data companies, such commercial products rely heavily on open source projects.

Hadoop ecosystem includes a set of tools that function near MapReduce and HDFS (the two main Hadoop core components) and help the two store and manage data, as well as perform the analytic tasks. As there is an increasing number of new technologies that encircle Hadoop, it is important to realize that certain products maybe more appropriate to fulfill certain requirements than others.… Click to read the full post

Hadoop as the Backbone of Big Data Technologies

Hadoop_Ecosystem

Apache Hadoop is an emerging technology that was designed to address the specific requirements of Big Data. It can deal with petabytes of structured and unstructured data. The technology was developed by Yahoo! in 2005 and it got its name from a toy elephant. However, Hadoop does not work alone. Rather, it is part of an increasing number of associated technologies such as HBase, Hive, Pig, Oozie, and Zookeeper.

Hadoop_Ecosystem

Apache Hadoop Ecosystem (source: quantfarm.com)

Hadoop:

  • Is Fault-tolerance open-source software framework that can deal with software and hardware failures.
  • Scales well to any increase in processors, memory or storage devices.
Click to read the full post

Obstacles to the Adoption of Big Data

Obstacles to Big Data Implementation (Source: http://www.eweek.com/)

Because customer relationship constitutes an important part of any strategic decision-making process, shifting towards Big Data technologies would enable executives to keep up with customer service expectations. A top concern for them is how to achieve faster access to data in order to overcome the many obstacles they would encounter.

Typically, data in organizations can be in the following three forms:

  • Structured Data. Such data is stored in databases (in tables) and can be accessed by using database management systems such as Oracle, DB2 and MySQL. This data constitutes only 10% of the universal data today.
  • Unstructured Data. Such data cannot be stored using traditional relational databases.
Click to read the full post