Big Data: Types of Data Used in Analytics

Data types involved in Big Data analytics are many: structured, unstructured, geographic, real-time media, natural language, time series, event, network and linked. It is necessary here to distinguish between human-generated data and device-generated data since human data is often less trustworthy, noisy and unclean.

A brief description of each type is given below.

data types

Types of data used in big data analytics

 

  1. Structured data: data stored in rows and columns, mostly numerical, where the meaning of each data item is defined. This type of data constitutes about 10% of the today’s total data and is accessible through database management systems.
Click to read the full post

Big Data Analytics: A Closer Look

Businesses have recognized the benefit rooted in data and, therefore, put together people, technologies, and processes to collect, store and analyze huge amounts of data. A key element for deriving value from data is by using analytics.

Big Data analytics is a business strategy that uses technology to gain deeper insights into customers, partners and businesses and so achieving competitive advantage. It includes working with huge data that, because of its size and variety, lie beyond the ability of typical database management systems to store, manage and analyze.

The use of “Big Data analytics” involves two dimensions: one is Big Data, which is annotated with the three ‘Vs’: data volume, velocity, and variety.… Click to read the full post

“I do big data for the food & beverage industry” – OK, but for which problem?

milk

I was reading through the 2013 report on “Formula for growth: Innovation, big data and analytics” that the US Grocery Manufacturers Association (GMA) and Deloitte Consulting L.L.P. produced. This is a very interesting report, since it outlines the opportunities big data offers to food and beverage manufacturers, bringing the industry perspective to the big data discussion. The report also discusses how data mining technologies are starting to transform the consumer packaged goods marketplace, and outlines what companies may do to use the technologies to improve performance.

A GMA video summarizing the report findings (find it here: http://www.gmaonline.org/issues-policy/collaborating-with-retailers/big-data-analytics/welcome)

A very interesting Deloitte video with Prof. Tom Davenport summarises the report findings

There are a couple of recommendations in the report that rang a bell:

  • Business context is required to operationalize big data, analytics, and innovation
  • The majority of the consumer packaged goods (CPG) industry is lagging in data and analytical capabilities
  • Rapid-fire pace of innovation requires data & analytics competency

Yes, I agree that context is king.… Click to read the full post

What on earth are Mars and IBM doing?

food

There are many discussions around the identification, registration and description of big datasets for agriculture, food and environment; but moving on from such introductory or academic exercises, how is big data actually used in practice for serving actual causes and solving real problems? An article on Wired shows a number of examples; however, we were looking for something bigger than that.

We recently came across the collaboration between IBM and Mars, which is expected to change the way that food safety is perceived; what these two giants (in technology and food context, respectively) are actually working on an index that will be a gold standard for food and health officials globally to understand what triggers contamination and the spread of foodborne diseases.… Click to read the full post

“It’s the variety, stupid”

14493234103_8d65485b5f_o

Well, I should have suspected it. But it was good to see more than 40 experts from around the world highlighting and explaining it: the special thing about big data in agriculture is its extreme variety.

This is what you get, if you contrast the four (as IBM suggests) V’s of big data to the data types and sources that are typically used in agricultural, food and environmental research. We are not talking about an extremely large Volume; other domains have much more voluminous data. It is not that they come with a high Velocity, especially compared to other domains.… Click to read the full post

Big data in Europe calls on Agro-Know

paris

We have been following Big Data Europe since its early stages, learning about the recent advances and trends in big data by prestigious partners like Fraunhofer (yes, the guys that invented MP3). We got more and more involved in this flagship big data initiative for Europe, sharing our understanding of data-related challenges in the agri-food sector, what kind of big data our communities work with, and how cutting edge solutions using big data analytics may be developed to serve their needs.

This is the time to take an important step forward: and beautiful Paris is the place where this will happen.… Click to read the full post

So, what is big data again?

bigdata

One of the hottest topics around nowadays is big data; everyone seems to talk about the huge amounts of data (Volume) produced from various sources and types and formats of data (both structured and unstructured) (Variety) and through a fast pace at which data flows in from sources like business processes, machines, networks and human interaction with things like social media sites, mobile devices, etc. (Velocity). These are the Three V’s of Big Data; the fourth being the Veracity, which refers to the biases, noise and abnormality in data.

ibm-big-data

Source: http://www.ibmbigdatahub.com/infographic/four-vs-big-data

Let us remind you that if you want a quick and dirty introduction on the basics of big data, you can check out the first three of the series of posts in our blog from our guest author Mohammed Zuhair Al Taie:

Click to read the full post

Big Data as Part of Internet of Things Solutions

Internet Of Things

The Internet of Things or IoT is basically a complex network that seamlessly connects people and things together through the Internet. Theoretically, anything that can be connected (smart watches, cars, homes, thermostats, vending machines, servers…) will be connected in the near future using sensors and RFID tags. This allows connected objects to continuously send data over the Web and from anywhere. The first time the term was used was in 1999 by Kevin Ashton, the creator of the RFID standard.

Internet Of Things

Source: http://inoviagroup.se

IoT will have the advantage of bringing us smart cities with smart cars, secure and efficient buildings, and smart traffic management systems.… Click to read the full post

The Future is for NoSQL Data Storage model!

NOSQL

Databases come in a variety of tastes, such as relational (e.g. Postgres, Oracle and MySQL), document-oriented (eg. MongoDB, CouchDB and SimpleDB), columnar (e.g. BigTable and HBase), key-value (e.g. MemcacheDB, Redis and Riak) XML (e.g. MarkLogic, BaseX and eXist) and graph (e.g. Neo4J, GraphDB and Giraph). All data stores support writing and retrieving data but with some differences in terms of database indexing, database schema, query format, data sharding, replication, scalability and others.

Although the relational model and the Structured Query Language (SQL) were for decades the de facto for storing data, it has become established that relational databases are no more the winners when it comes to flexibility and scalability.… Click to read the full post

Hadoop Ecosystem: an Integrated Environment for Big Data

Hadoop_Ecosystem

Hadoop is currently the most common single Big Data platform. However, still other techniques play a role in the scene. While there are proprietary distributions for Hadoop which are developed by giant Big Data companies, such commercial products rely heavily on open source projects.

Hadoop ecosystem includes a set of tools that function near MapReduce and HDFS (the two main Hadoop core components) and help the two store and manage data, as well as perform the analytic tasks. As there is an increasing number of new technologies that encircle Hadoop, it is important to realize that certain products maybe more appropriate to fulfill certain requirements than others.… Click to read the full post