RDBMS: Basics
LehaUchicha
Posted on August 24, 2021
Table of Content:
- OLTP && OLAP
- CAP theorem
- ACID
- Normalization
- Indexes
- Triggers
- Stored procedures
- Views
- Scaling
- Joins
- Optimistic/Pessimistic locks
- Nosql Types
- New Sql
OLTP and OLAP
Basically all the databases divided into OLTP and OLAP types.
OLTP - Online transaction processing.
OLAP - Online analytical processing
RDMBS supported both types of processing.
For example, when you try to INSERT, UPDATE, DELETE it's OLTP.
But when You try to select with aggregation operation, like GROUP BY, COUNT, SUM. This is
OLAP.
If application works perfect with a thousand of OLTP queries, it can be impossible to work with
a thousand of OLAP operations. For example: Your application can have a feature, which allows finding goods which
users with similar preferences with you are buying.
For solving such problems NoSQL was invented.
Cap theorem
In normal operations, your data store provides all three functions.
But the CAP theorem maintains that when a distributed database experiences a network failure,
you can provide either consistency or availability.
ACID
ACID is a set of guiding principles that ensure database transactions are processed reliably.
- A - Atomicity. Commits finish an entire operation successfully or roll back to it's prior state
- C - Consistency. Any change maintains data integrity or is cancelled completely
- I - Isolation. Any read or write will not be impacted by other reads or writes of separate transaction
- D - Durability. Successful commits will survive permanently. ***
Normalization
- Normalization is the process of organizing the data in the database.
- Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate the undesirable characteristics like Insertion, Update and Deletion Anomalies.
- Normalization divides the larger table into the smaller table and links them using relationship.
- The normal form is used to reduce redundancy from the database table.
Possible normal forms:
- Initial data
- 1NF
- 2NF
- 3NF
- EKNF
- 4NF
- ETNF
- 5NF
- DKNF
- 6NF
Indexes
Indexing is an object, which is used to optimize the performance of a search query.
Main types of indexes are:
Clustered | Non-clustered |
---|---|
* Physically change structure of data storing to B-tree | * Don't change structure of data physically |
* Can be created the single per tab;e | * May be a lot per table |
* Performance is higher the non-clustered index | Performance is less then clustered index |
Other index types:
- Bitmap index
- Dense index
- Sparse index
- Reverse index
- Primary index
- Secondary index ***
Triggers
Trigger is a special type of stored procedure, which user don't execute directly, but which executed by INSERT,
UPDATE or DELETE operations. Allow achieving data consistency and implements complex business logic.
Use case: You have 2 data storages, one of them is hot storage which use actual data and another storage is
cold storage, which store archived data. When user try to delete data from hot storage, in trigger you can implement
moving data to cold storage and after that delete from hot storage.
Stored procedures
Stored procedures can contain some part of business logic.
Pros:
* Can have high performance
* Can be secured
Cons:
* Maintainability and Supportability are decreasing, due to business logic present as in application
as in database. So debug and fixing can take much more time
Views
Views - is a logical structure of database, which allow executing query like to a simple table,
but can aggregate data from different tables.
View types:
- Views - logical structure. CAn be pretty slow, due to doesn't store data and executing after each user's query. Data always actual.
- Materialized views - the result is stored on disk. It takes some time for building materialized view, but performance is high. Data can be not actual, due to you need to rebuild view after adding, but it is expensive operation.
The ways of updating materialized views:
- Manual
- By schedule
- In case if one of the base tables has been updated.
Scaling
Vertical
Vertical scaling allow increasing computing capabilities by upgrading current hardware.
In most cases add second machine is cheaper than upgrade the exists one with equal characteristics.
Horizontal
Partitioning
A partition is a division of a logical database or its constituent elements into distinct independent
parts. Database partitioning is normally done for manageability, performance or availability reasons,
or for load balancing. It is popular in distributed database management systems, where each partition
may be spread over multiple nodes, with users at the node performing local transactions on the
partition. This increases performance for sites that have regular transactions involving certain
views of data, whilst maintaining availability and security.
Example: On info sites it makes sense to partition records by publish date, due to recent news
more interesting for users, then old ones.
Partitioning criteria:
- Range partitioning
- List partitioning
- Composite partitioning
- Round-robin partitioning
- Hash partitioning
Replication
The central database is called the publication database because it provides the data for users at
other sites. The data in the publication database is copied (replicated) to subscription databases at
other locations. All users whether connected to the publisher server (the Publisher) or to a server
at one of the remote sites (a Subscriber) see the same data and work on the same records.
A Subscriber can be a Subscriber such as a file server or a disconnected Subscriber such as a laptop.
Basically, Publisher instance used for write operations and Subscriber instances for reading. So, in standard,
configuration there is one publisher and a couple of subscribers.
When you have a heavy update operation and publisher server work with this operation, your application
works without problems and any slow downs, because you have subscriber instances which allow read data
for users.
Sharding
is a horizontal partition of data in a database or search engine. Each shard is held on a separate
database server instance, to spread load. Some data within a database remains present in all shards, but some appears only in a single shard.
Each shard (or server) acts as the single source for this subset of data.
Example: in social network USer ID can be the good candidate for sharding key, due to all the data regarding
user will be stored on the same server
Joins
Optimistic/Pessimistic locks
We use locks to protect data integrity and atomicity in concurrent applications where a record
could get read/write requests.
- Optimistic Locking is when you check if the record was updated by someone else before you commit the transaction.
- Pessimistic locking is when you take an exclusive lock so that no one else can start modifying the record.
Nosql Types
Here are the four main types of NoSQL databases:
Document databases. In example: MongoDB
Key-value stores. In example: Redis
Column-oriented databases. In example: ClickHouse, MariaDB, Apache Hbase
Graph databases. In example: Neo4j
New Sql
NewSQL is a new approach to relational databases that wants to combine transactional
ACID (atomicity, consistency, isolation, durability) guarantees of good ol’ RDBMSs and
the horizontal scalability of NoSQL. It sounds like a perfect solution, the best of both worlds.
Examples: VoltDB, MemSQL, Tarantool, Cosmos DB, Cloud Spanner.
If you like an article, you can support me
Posted on August 24, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024
November 29, 2024