SQL Concepts
Roshansahurk
Posted on February 22, 2023
This paper gives an outlook about SQL Concepts and Database .
The concepts discussed in the paper includes ACID, CAP, JOINS, AGGREGATION FILTERS IN QUERIES, NORMALIZATION, INDEXES, LOCKING MECHANISM, DATABASE ISOLATION LEVELS AND TRIGGERS.
ACID
-
The acronym for ACID are:
- A - Atomicity
- C - Consistency
- I - Isolation
- D - Durability
Before diving into the ACID, we must first see what are transaction.
Transaction
A single task is the smallest unit of processing that cannot be divided any further. The set of tasks collectively is referred to as transactions.
A transaction typically involves one or more database operations, such as inserting, updating, or deleting records in one or more tables. All of these operations are executed as a single unit of work, either all succeeding or all failing, and are usually wrapped in a BEGIN TRANSACTION and COMMIT TRANSACTION statement.
If an error occurs during the transaction, the changes made so far can be rolled back or undone, using a ROLLBACK TRANSACTION statement. This ensures that the database remains in a consistent state and prevents data corruption or loss.
Transactions can also harm performance since they require additional resources and can result in the locking and blocking of data. Therefore, it is important to use transactions judiciously and optimize them for maximum efficiency.
Atomicity
The principle of atomicity is commonly referred to as the 'All or nothing rule', which means that a transaction must be executed entirely or not at all.
If an action is performed on a transaction there are only two possible outcomes whether there is complete execution or the execution is not successful.
-
It involves the following two operations.
- Abort - Changes made to the database are invisible if a transaction aborts.
- Commit - Changes are shown after a transaction commits.
Atomicity is also known as the 'All or nothing rule'.
Consistency
- Consistency refers to the idea that the value must constantly be maintained and the correctness of the data.
- For example, Bank statements
- Banks maintain consistent rules to avoid any mishappenings.
- To maintain consistency in the database, integrity constraints must be enforced before and after each transaction.
Isolation
- It is not necessary that at a time only one transaction would happen.
What if there are numerous transactions? The consistency may break which defeats the whole purpose of being consistent in the database.
Isolation guarantees that numerous transactions can take place simultaneously without changing the database state to become inconsistent. Transactions take place without interruption and independently.
Transactions are isolated from each other to ensure that any changes made to data in one transaction are not visible to other transactions until they are committed to the database.
This property ensures that the execution of transactions concurrently will result in a state that is equivalent to a state achieved these were executed serially in some order.
Durability
- Durability refers that data remaining in the database permanently once an operation has been completed.
- Data should be durable so that in case of system failure, malfunctions or crash, it can still function.
- However, if the data disappears, the recovery manager is now in charge, to ensure that the data remains safe.
CAP Theorem
-
CAP stands for **Consistency, **Availability and **Partition Tolerance.
- Consistency - Consistency means that every read gets the most recent write or an error.
- Availability - Availability means that every request receives a response without guaranteeing that it contains the most recent write.
- Partition Tolerance - Partition Tolerance means that the system continues to function despite messages being dropped or delayed between nodes.
The CAP theorem states that a distributed data store can provide only two of the following three guarantees: consistency, availability, and partition tolerance.
Sometimes, all three can be provided, but at other times, there may be trade-offs between these guarantees.
Consistency comes at the expense of availability.
Availability comes at the expense of consistency.
When responding to a query, a highly available service may provide the current value on the server, but there is no guarantee that it is the most recent value submitted to the database due to potential delays in transit.
To ensure high consistency, we must wait for the new write or return an error to the query. This means sacrificing availability to ensure the data returned by the query is consistent.
Joins
- Join statement is used in SQL to join two or more rows/columns to the table.
- The different type of join are INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL OUTER JOIN.
SQL is case-insensitive. To differentiate SQL queries, uppercase has been used.
-
INNER JOIN - It returns only the matching records from both tables based on the condition provided in the ON clause.
SELECT table1.column, table2.column FROM table1 INNER JOIN table2 ON table1.column = table2.column;
-
LEFT JOIN - It returns all the records from the left table and matching records from the right table. If there are no matching records in the right table, it returns NULL.
SELECT table1.column, table2.column FROM table1 LEFT OUTER JOIN table2 ON table1.column = table2.column;
-
RIGHT JOIN - It returns all the records from the right table and matching records from the left table. If there are no matching records in the left table, it returns NULL.
SELECT table1.column, table2.column FROM table1 RIGHT OUTER JOIN table2 ON table1.column = table2.column;
-
FULL OUTER JOIN - It returns all the records from both tables and matching records where they exist. If there are no matching records in either table, it returns NULL.
SELECT table1.column, table2.column FROM table1 FULL OUTER JOIN table2 ON table1.column = table2.column;
NOTE - Here table1 and table2 refers to the name of the table. The name of the table varies from database to database. The same applies to the column also.
Aggregations, Filters in Queries
- Aggregate function in SQL returns one value after calculating multiple values in a column.
-
Various types of Aggregate functions are:
-
SUM(arg) - The sum function returns the total sum of the column.
SELECT SUM(column_name) FROM table_name WHERE column_name2 = 'value';
-
COUNT(arg) - The count function returns the total number of occurrences in the column.
SELECT COUNT(*) FROM table_name WHERE column_name = 'value';
-
AVG(arg) - Average function calculates the average of the set of values.
SELECT AVG(column_name) FROM table_name WHERE column_name2 = 'value';
-
MIN(arg) - The Min function returns the minimum value available in the column.
SELECT MIN(column_name) FROM table_name WHERE column_name2 = 'value';
-
MAX(arg) - Max function returns the maximum value available in the column.
SELECT MAX(column_name) FROM table_name WHERE column_name2 = 'value';
-
Filters are text strings that we use to specify a subset of the data items in the given conditions.
In the above example filters have been used.
-
WHERE is the filter used in the above queries
WHERE (CONDITION1 OR CONDITION2)
More than one condition can be put through the where statement to filter the data in the database.
Normalization
- Normalization is the process of restructuring a relational database by a series of so-called normal forms to reduce data redundancy and improve data integrity.
- If you have a messy database or unintentionally synthesized one with poor integrity, then database normalization is the solution for you.
- Large tables can be divided into smaller tables through the process of normalization.
- The goal is to minimize data duplication and eliminate data inconsistencies.
- Normalization principles can design efficient and robust databases that are easier to maintain and update.
-
There are different levels of normalization ranging from the first normal form (1NF) to the fifth normal form (5NF).
- 1NF requires that each column in a table contains indivisible values. Each column must have a unique name and no two rows in the table can be identical.
- 2NF requires that the table is in 1NF and that each non-key column in the table is dependent on the entire primary key, not just part of it.
- 3NF requires that the table is in 2NF and that there are no non-key columns that are dependent on other non-key columns. Specifically, in normalization, non-key columns should only depend on the primary key.
- 4NF requires that the table is in BCNF and that there are no multi-valued dependencies (when a table has multiple independent relationships with another table).
- 5NF requires that the table is in 4NF and that it does not have any lossless decomposition. In other words, it should not be possible to combine two tables and get the original table without losing any information.
Indexes
- Indexes is used to improve the performance of queries by providing quick access to data. Indexes load the data quickly instead of loading data.
- They can be created on one or more columns and can be created as unique or non-unique.
- A unique index ensures that each value in the indexed value is unique.
- A non-unique index can have duplicate values.
- When a query is executed, the database engine uses the index to quickly locate the relevant rows, reducing the time and resources needed to retrieve data.
- The only disadvantage is that they take up storage space and require additional processing time to maintain as data is inserted, updated and deleted.
Locking Mechanism
- Locking is a mechanism used by databases to manage concurrent access to data by multiple users or processes.
It prevents conflicts and ensures data integrity by allowing only one user or process to modify a piece of data at a time.
-
There are two types of locks:
- Shared locks - These allow multiple users to read a piece of data simultaneously.
- Exclusive locks - It allows only one user to modify the data at a time.
When a user attempts to modify a piece of data, the database will first obtain an exclusive lock on the data to ensure that no other user is modifying it at the same time.
Upon committing or rolling back a transaction, the lock is released.
Locking can impact database performance since it requires additional resources and can result in the blocking of data.
Database isolation levels
Database isolation levels are a set of rules that determine how transactions interact with one another in a multi-user database environment.
The isolation level determines how concurrent transactions can access and modify the same data, and how changes made by one transaction are visible to others.
-
There are four commonly used isolation levels in databases: Read Uncommitted, Read Committed, Repeatable Read, and Serializable.
-
Read Uncommitted - The Read Uncommitted isolation level allows transactions to read uncommitted data, which means a transaction can read data modified by another uncommitted transaction.
- The highest level of concurrency.
- The lowest level of data consistency.
-
Read Committed - The Read Committed isolation level ensures that a transaction reads only committed data, which means a transaction will not read data modified by another uncommitted transaction.
- A higher level of consistency than Read Uncommitted, but comes with reduced concurrency.
-
Repeatable Read - The Repeatable Read isolation level ensures that a transaction reads the same set of data throughout its lifetime, even if other transactions modify the data in the meantime.
- A higher level of consistency than Read Committed, but reduces concurrency even further.
-
Serializable - The Serializable isolation level provides the highest level of consistency, ensuring that transactions execute in a serializable order and that all transactions are isolated from one another.
- Strongest guarantees of data consistency, but also the lowest level of concurrency.
-
Read Uncommitted - The Read Uncommitted isolation level allows transactions to read uncommitted data, which means a transaction can read data modified by another uncommitted transaction.
Choosing the appropriate isolation level depends on the specific requirements of the application.
Triggers
Triggers are database objects that are used to automatically execute a set of SQL statements in response to certain database events or data changes.
They can be used to enforce business rules, audit data changes, maintain referential integrity, and perform other automated tasks.
Triggers are typically defined to execute in response to one of the following events: INSERT, UPDATE, or DELETE. They can be defined to execute either before or after the triggering event occurs.
Triggers can be defined at either the table level or the database level and can be used to enforce complex business rules and data validation.
While triggers can be powerful tools for enforcing data integrity and automating tasks, they can also have an impact on database performance if they are not used appropriately.
References
- To know more about the topics discussed above, you can click on the below links.
Posted on February 22, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.