Databases come in a variety of tastes, such as relational (e.g. Postgres, Oracle and MySQL), document-oriented (eg. MongoDB, CouchDB and SimpleDB), columnar (e.g. BigTable and HBase), key-value (e.g. MemcacheDB, Redis and Riak) XML (e.g. MarkLogic, BaseX and eXist) and graph (e.g. Neo4J, GraphDB and Giraph). All data stores support writing and retrieving data but with some differences in terms of database indexing, database schema, query format, data sharding, replication, scalability and others.
Although the relational model and the Structured Query Language (SQL) were for decades the de facto for storing data, it has become established that relational databases are no more the winners when it comes to flexibility and scalability. This became true especially with the advent of online social networking and Internet of Things (IoT) that generate huge amount of data every minute and required the use of new methods for data processing, storage and analytics, far from the traditional Relational Database Management Systems (RDBMS) that were not designed to deal with such variety and volume of data. In this regard, NoSQL arose as a non-traditional paradigm to deal with large data (at the Web scale) and to solve the challenges posed by the arrival of Big Data implementations within the time and cost constraints.
Relational databases, which are based on set-theory implementation and maintain tables of two-dimensions with rows and columns, are good for storing data when the layout of data is known in advance and the data is quite regular in nature. They are clearly the best option to consider when for example customers make bank transactions. But when it comes to the data that is highly variable or deeply hierarchical, the traditional table for storing data is not the best option.
Compared to relational data stores, NoSQL databases, which are not directly involved in SQL operations, have introduced several advantages over RDBMS. They offered high scalability, manageability and administration, low cost, schemaless data representation, development time, speed and flexible data models. The term (i.e. NoSQL) was first coined by Carlo Strozzi who used it in 1998 to describe the database that he built and used without the intervention of SQL queries. Today, companies like MongoDB, Couchbase and Casandra are introducing new innovations to the database market and are achieving surprising results. Let’s take a look at the advantages offered by popular NoSQL databases:
- Key-value databases, which pair keys to values like huge hash table (or hash map) do in any programming languages, need no or little maintenance and are suitable for cases where the data are not heavily related (like session data in web applications).
- Columnar databases store data in columns that are inexpensive to add. They are most suitable for Big Data problems and they provide support for data compression and versioning.
- Document databases can implement replication and sharding relatively easily and thus making a favour to distributed environments. Data is stored in the form of documents that consist of unique ID fields and values with a variety of data types. Such databases are appropriate for use when we do not exactly know how the data will look like, or the problem incorporates highly variable domains. MongoDB and CouchDB as the two major open source products.
- Graph databases usually adhere to online social network scenarios, where nodes represent entities and ties represent the interconnections among them. In this way, it is possible to traverse across the graph by following relationships. Graph databases have been used to deal with problems related to recommender systems and access control lists, making use of their ability to deal with highly interconnected data, where data is stored in both nodes and edges in the form of key-value pairs