How data is replicated in salve nodes in SQL databases
Ankit malik
Posted on September 22, 2024
Introduction
While going through the chapter-5
of book Designing Data Intensive Applications
. I came across these replication strategies which are being used in SQL databases for creating the replicas while ensuring data consistency and availability across multiple nodes. It involves copying data from a primary node (master) to one or more secondary nodes (slaves). This redundancy provides fault tolerance and scalability, allowing for uninterrupted operations even if the primary node fails.
In this article, we will explore four common replication strategies used in SQL databases:
1. Statement-based replication
- The primary node sends SQL statements to the slave nodes.
- The slave nodes execute these statements, resulting in the same data changes as on the primary node.
Advantages:
- Simple to implement and manage.
- Can be used with a variety of database engines.
Disadvantages:
- Can be inefficient for complex queries or large data sets.
- May not handle certain types of data changes accurately (e.g., triggers, stored procedures).
2. Write-ahead log (WAL) shipping
- The primary node records all data changes in a WAL files.
- The WAL is shipped to the slave nodes, which apply the changes to their local data.
- It is used by PostgreSQL and Oracle db.
Advantages:
- Highly efficient for large data sets and complex queries.
- Can handle a wide range of data changes accurately.
Disadvantages:
- The main disadvantage is that the it contains the data on a very low level a WAL like bytes. Due to this replication is coupled with storage engines. So, it's not possible to change the storage engine of slave nodes.
- Requires more complex infrastructure and management.
- May be less tolerant to network failures.
3. Logical (row-based) log replication
It contains sequence of records describing writes to database tables at the granularity of a row:
- The primary node records data changes at the row level.
- The slave nodes apply these changes to their local data, ensuring consistency even for complex data transformations.
- For insert operation, the log contains the new values of all columns.
- For a deleted row, the log contains enough information to uniquely identify the row that was deleted.
- For an updated row, the log contains enough information to uniquely identify the updated row, and the new values of all columns (or at least the new values of all columns that changed).
- Mysql's binlog user this approach.
Advantages:
- Provides fine-grained control over data replication.
- Can handle complex data transformations effectively.
Disadvantages:
- Can be more complex to implement and manage.
- May be less efficient for large data sets.
4. Trigger-based replication
- So far all these replication doesn't involve application code but certain cases may be there where flexibility is needed like CDC(change data capture).
- Triggers are defined on the primary node to capture data changes.
- These triggers send the changes to the slave nodes, which apply them to their local data.
Advantages:
- Provides flexibility in defining replication rules.
- Can be used for specific use cases where triggers are already in place.
Disadvantages:
- Can be less efficient and more complex to manage than other methods.
- May introduce additional overhead and potential performance issues.
Choosing the right replication strategy:
The best replication strategy for a given database depends on various factors, including:
- Data volume and complexity: WAL shipping or logical replication may be more efficient for large data sets or complex queries.
- Replication frequency: Statement-based replication may be sufficient for less frequent updates.
- Data consistency requirements: Logical replication or trigger-based replication may be better suited for ensuring data integrity in complex scenarios.
- Infrastructure and management considerations: WAL shipping or statement-based replication may be simpler to implement and manage.
By carefully considering these factors, database administrators can select the most appropriate replication strategy to meet the specific needs of their applications.
Posted on September 22, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.