Universal Data Migration: Using Slingdata to Transfer Data Between Databases
Shahab Ranjbary
Posted on May 24, 2024
Overview
Data management can be complex, especially when dealing with high-volume data pipelines. Whether you're moving data between databases, from file systems to databases, or vice versa, the Extract & Load (EL) approach can streamline these processes. Enter Slingdata, a powerful, free CLI tool built with Go that offers a straightforward solution for creating and maintaining robust data pipelines.
What is Slingdata?
Slingdata is a passion project turned into a practical tool that simplifies data movement across various systems. Its key focus is on the Extract & Load (EL) process, enabling efficient data transfer between:
- Database to Database
- File System to Database
- Database to File System
Key Features of Slingdata
- Single Binary deployment (built with Go). See the installation page.
- Use Custom SQL as a stream.
- Manage / View / Test / Discover your connections with the
sling conns
sub-command. - Use Environment Variables for connections.
- Provide YAML or JSON configurations (perfect for git version control).
- Powerful Replication logic, to replicate many tables with a wildcard (my_schema.*).
- Reads your existing DBT connections.
- Use your environment variable in your YAML / JSON config (
SELECT * from my_table where date = '{date}'
). - Convenient Transformations, such as the flatten option, which auto-creates columns from your nested fields.
- Run Pre & Post SQL commands.
- And many more!
Sample EL Process: ClickHouse to PostgreSQL
To showcase Slingdata’s capabilities, let's walk through a sample Extract & Load process moving data from ClickHouse to PostgreSQL.
Prerequisites
Ensure you have:
- ClickHouse installed and running.
- PostgreSQL installed and running.
- Slingdata installed.
Step 1: Configure the Source Database
Let's assume our source database is ClickHouse. We can export a new environment variable or add the connection to the env.yaml
of Slingdata files:
export MARKETING_URL="postgresql://postgres:postgres@localhost:5432/postgres?sslmode=disable"
To test this connection:
sling conns test MARKETING_URL
You should see the result:
INF success!
We will do the same for ClickHouse:
export DATAWAREHOUSE_URL="http://sling_user:sling_pass@clickhouse:8123/marketing"
To test this connection:
sling conns test DATAWAREHOUSE_URL
You should see the result:
INF success!
Now, let's create a database and a table in ClickHouse:
create database marketing;
CREATE TABLE `marketing`.users (
id UInt64,
name String,
family String,
age UInt8
) ENGINE = MergeTree()
ORDER BY id;
And in PostgreSQL, create a table with the same schema:
create table users(
id serial not null primary key,
name text not null,
family text,
age text not null
);
Insert data into ClickHouse:
insert into marketing.users values(1,'Shahab', 'Ranjbary', 28);
insert into marketing.users values(2,'Sepher', 'Ranjbary', 18);
insert into marketing.users values(3,'Mostafa', 'Fekri', 38);
Now, to move data to PostgreSQL, you can either create a replication file or run the following command:
sling run --src-conn DATAWAREHOUSE_URL --src-stream marketing.users --tgt-conn MARKETING_URL --tgt-object public.users
Conclusion
Slingdata offers a powerful and flexible way to manage data pipelines with ease. Its extensive features, combined with simple configurations and robust performance, make it an excellent tool for anyone dealing with high-volume data transfers.
Posted on May 24, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.