RDBMS: Basics

Table of Content:

OLTP && OLAP
CAP theorem
ACID
Normalization
Indexes
Triggers
Stored procedures
Views
Scaling
- Vertical
- Horizontal
Joins
Optimistic/Pessimistic locks
Nosql Types
New Sql

OLTP and OLAP

Basically all the databases divided into OLTP and OLAP types.
OLTP - Online transaction processing.
OLAP - Online analytical processing

RDMBS supported both types of processing.
For example, when you try to INSERT, UPDATE, DELETE it's OLTP.
But when You try to select with aggregation operation, like GROUP BY, COUNT, SUM. This is
OLAP.
If application works perfect with a thousand of OLTP queries, it can be impossible to work with
a thousand of OLAP operations. For example: Your application can have a feature, which allows finding goods which
users with similar preferences with you are buying.

For solving such problems NoSQL was invented.

Cap theorem

In normal operations, your data store provides all three functions.
But the CAP theorem maintains that when a distributed database experiences a network failure,
you can provide either consistency or availability.

ACID

ACID is a set of guiding principles that ensure database transactions are processed reliably.

A - Atomicity. Commits finish an entire operation successfully or roll back to it's prior state
C - Consistency. Any change maintains data integrity or is cancelled completely
I - Isolation. Any read or write will not be impacted by other reads or writes of separate transaction
D - Durability. Successful commits will survive permanently. ***

Normalization

Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate the undesirable characteristics like Insertion, Update and Deletion Anomalies.
Normalization divides the larger table into the smaller table and links them using relationship.
The normal form is used to reduce redundancy from the database table.

Possible normal forms:

Initial data
1NF
2NF
3NF
EKNF
4NF
ETNF
5NF
DKNF
6NF

Indexes

Indexing is an object, which is used to optimize the performance of a search query.

Main types of indexes are:

Clustered	Non-clustered
* Physically change structure of data storing to B-tree	* Don't change structure of data physically
* Can be created the single per tab;e	* May be a lot per table
* Performance is higher the non-clustered index	Performance is less then clustered index

Other index types:

Bitmap index
Dense index
Sparse index
Reverse index
Primary index
Secondary index ***

Triggers

Trigger is a special type of stored procedure, which user don't execute directly, but which executed by INSERT,
UPDATE or DELETE operations. Allow achieving data consistency and implements complex business logic.

Use case: You have 2 data storages, one of them is hot storage which use actual data and another storage is
cold storage, which store archived data. When user try to delete data from hot storage, in trigger you can implement
moving data to cold storage and after that delete from hot storage.

Stored procedures

Stored procedures can contain some part of business logic.
Pros:
* Can have high performance
* Can be secured
Cons:
* Maintainability and Supportability are decreasing, due to business logic present as in application
as in database. So debug and fixing can take much more time

Views

Views - is a logical structure of database, which allow executing query like to a simple table,
but can aggregate data from different tables.

View types:

Views - logical structure. CAn be pretty slow, due to doesn't store data and executing after each user's query. Data always actual.
Materialized views - the result is stored on disk. It takes some time for building materialized view, but performance is high. Data can be not actual, due to you need to rebuild view after adding, but it is expensive operation.

The ways of updating materialized views:

Manual
By schedule
In case if one of the base tables has been updated.

Scaling

Vertical

Vertical scaling allow increasing computing capabilities by upgrading current hardware.
In most cases add second machine is cheaper than upgrade the exists one with equal characteristics.

Horizontal

Partitioning

A partition is a division of a logical database or its constituent elements into distinct independent
parts. Database partitioning is normally done for manageability, performance or availability reasons,
or for load balancing. It is popular in distributed database management systems, where each partition
may be spread over multiple nodes, with users at the node performing local transactions on the
partition. This increases performance for sites that have regular transactions involving certain
views of data, whilst maintaining availability and security.

Example: On info sites it makes sense to partition records by publish date, due to recent news
more interesting for users, then old ones.

Partitioning criteria:

Range partitioning
List partitioning
Composite partitioning
Round-robin partitioning
Hash partitioning

Replication

The central database is called the publication database because it provides the data for users at
other sites. The data in the publication database is copied (replicated) to subscription databases at
other locations. All users whether connected to the publisher server (the Publisher) or to a server
at one of the remote sites (a Subscriber) see the same data and work on the same records.
A Subscriber can be a Subscriber such as a file server or a disconnected Subscriber such as a laptop.

Basically, Publisher instance used for write operations and Subscriber instances for reading. So, in standard,
configuration there is one publisher and a couple of subscribers.
When you have a heavy update operation and publisher server work with this operation, your application
works without problems and any slow downs, because you have subscriber instances which allow read data
for users.

Sharding

is a horizontal partition of data in a database or search engine. Each shard is held on a separate
database server instance, to spread load. Some data within a database remains present in all shards, but some appears only in a single shard.
Each shard (or server) acts as the single source for this subset of data.

Example: in social network USer ID can be the good candidate for sharding key, due to all the data regarding
user will be stored on the same server

Joins

Optimistic/Pessimistic locks

We use locks to protect data integrity and atomicity in concurrent applications where a record
could get read/write requests.

Optimistic Locking is when you check if the record was updated by someone else before you commit the transaction.
Pessimistic locking is when you take an exclusive lock so that no one else can start modifying the record.

Nosql Types

Here are the four main types of NoSQL databases:

Document databases. In example: MongoDB
Key-value stores. In example: Redis
Column-oriented databases. In example: ClickHouse, MariaDB, Apache Hbase
Graph databases. In example: Neo4j

New Sql

NewSQL is a new approach to relational databases that wants to combine transactional
ACID (atomicity, consistency, isolation, durability) guarantees of good ol’ RDBMSs and
the horizontal scalability of NoSQL. It sounds like a perfect solution, the best of both worlds.
Examples: VoltDB, MemSQL, Tarantool, Cosmos DB, Cloud Spanner.

If you like an article, you can support me

by buying me a coffee