Distributed Data and NoSQL

mercykiria

MercyMburu

Posted on February 12, 2024

Distributed Data and NoSQL

Relational Databases are of huge help when it comes to analysis. Data is arranged in form of rows and columns and this in one or more tables or relations; Based on the relational model of data proposed by E.F. Codd in 1970.
However with the rapid increasing growth of information like web transactions and machine-generated data, large-scale analytics(problems require finding meaningful patterns in data sets that are so large as to require leading-edge processing and storage capability), it is becoming hard to use just relational databases.

an relational database image from bing AI

Components of a Relational Database System

  • Database management system software
  • Physical servers on which software the loaded
  • Disks where the data items are stored

Imagine a scenario where more databases are coming in and more powerful servers are needed to process the large databases containing more data. Eventually a limit will be reached.
Alternatively, cloud-based distributed processing takes a large volume of data and breaks it into pieces. Viola!! The solution to our storage problems ladies and gentlemen!🙌
These small amounts of data are distributed among many computers in different locations. Basically, each computer has it's own task. However after processing, the data still has to be stored somewhere🤔. Another type of database is needed.
Note: Relational databases still allow this kind of distribution but sometimes to really take advantage of this distribution or unstructured data, NoSQL database is needed.

NoSQL and how it works

You should know:

  1. A NoSQL database stores and accesses data differently that relational ones.
  2. NoSQL is sometimes called non-relational because it doesn't organize data into tables conforming to a structured schema. So data is stored in a non structured or semi-structured format that makes database design simpler.

There are 4 main types of NoSQL databases
Key-value stores
This type stores just the key and its value and each key is unique. For example, storing all the contents of a shopping cart for one session on a online retail website. Also, a session ID that identifies all the activity of a single user during one session on a website. Remember it's still possible to do this on a relational database but key-value stores are optimized to store billions of these keys and are very efficient at retrieving the data quickly.The values of the keys can have a totally different structure from one key to another.

example of a key and value

Document databases
Store documents in a machine readable format;JSON,XML. They aren't as efficient as relational databases when it comes to managing the relationships between documents but instead, are efficient when it comes to reading and searching them.
The structure of one document doesn't need to be the same as the other documents'. Elements can therefore be easily added without needing to change any tables or schemas.

An example photo of a document database from Cisco Networking Essentials Data Analytics course

Graph Databases
These databases store nodes that are connected to other nodes in a network. An example of a node can be a user on social media who is connected to all other friends. Another example is points on a map that are connected in real life, so it's easy to find routes between them.
Graph databases are optimized to query through these networks and navigate through the connections a lot faster and more efficiently than a relational database can join tables and find those relationships. They are designed to store huge amounts of interrelated nodes.

Example of a graph node database system
The image above is a fragment of a movies graph database, where we can see movie nodes(purple), person nodes(orange) and relationships between them(arrows).

Wide column stores
Store information in tables but the difference is that the columns are not attributes,they are values. Imagine a huge table from a streaming platform with a row for each user in the system and a column for each movie in the platform, where we record whether a user has watched that movie or not and some information about that viewing.To have a table like that in a relational database or in a flat file would be impossible when we are talking about millions or rows and columns. Most of those “cells” would be empty, because a user usually has just watched a few of the videos. These databases are designed to store these huge tables with “sparse” information in a very efficient way and retrieve it very quickly.

Important points to note
Relational databases are known for reliability, correctness, and version control, but NoSQL databases leave open many more possibilities for error and retrieving data that is not from the most recent version of the database. Something that needs to be considered when deciding to use a NoSQL solution.

NoSQL is usually a good choice when there are large amounts of data that change frequently, or when working with flexible formats that don't fit into a relational database model. Common NoSQL database systems include MongoDB, Apache Cassandra, and Amazon DynamoDB.

The advantages and challenges of NoSQL databases as compared to relational databases are as follows:

Advantages:

  • Designed for large, unstructured datasets.
  • Able to add new data that is structured differently than the data already in the database, which is not possible with flat files or RDBMS.
  • Can scale quickly to support rapid data growth.

Challenges:

  • Validating input fields against existing data like SQL databases do is not possible.
  • Temporal inconsistencies that allow for different versions to be confused.
  • Less application support for NoSQL.
  • No standardization of the ways to query the NoSQL databases.

This article has been written with 🩶.
Please feel free to add any comments pertaining more about NoSQL approach and literally any other random stuff🌝.

💖 💪 🙅 🚩
mercykiria
MercyMburu

Posted on February 12, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Distributed Data and NoSQL
distributedsystems Distributed Data and NoSQL

February 12, 2024