Moving data from CockroachDB🪳 to PostgreSQL🐘 or YugabyteDB🚀
Franck Pachot
Posted on December 16, 2022
A user asked on the YugabyteDB community slack how to migrate from CockroachDB. You may think that "wire-compatibility" with PostgreSQL makes it easy, it is not. The ways we usually export data from PostgreSQL doesn't work with CRDB. However because PostgreSQL is powerful and YugabyteDB benefits from all those SQL features, there is an easy solution with Foreign Data Wrapper.
Start the lab
For this test I'm starting the two databases in Docker, initialize the TPCC demo database in CockroachDB and create an empty one in YugabyteDB
# start containers
docker run --network yb -d --rm --name yb -p 5433:5433 -p 7000:7000 -p 15433:15433 yugabytedb/yugabyte:latest bash -c "
yugabyted start --background false --tserver_flags="ysql_beta_features=true"
"
docker run --network yb -d --rm --name cr -p 26257:26257 cockroachdb/cockroach:latest bash -c "
cockroach start-single-node --insecure
"
# create demo tpcc tables in 🪳
docker exec -it cr cockroach workload init tpcc
# create empty db in 🚀
PGHOST=localhost PGPORT=5433 PGUSER=yugabyte PGDATABASE=yugabyte
until pg_isready ; do sleep 1 ; done | uniq &&
psql -c "
create database tpcc
"
PGDATABASE=tpcc
DDL: Migrate the schema
CRDB accepts connections though the PostgreSQL protocol but is not compatible with the PostgreSQL catalog views, and then doesn't support pg_dump
:
Franck@YB:~ $ pg_dump -U root -h localhost -p 26257 -d tpcc -s
pg_dump: error: query failed: ERROR: column "x.tableoid" does not exist
pg_dump: error: query was: SELECT x.tableoid, x.oid, x.extname, n.nspname, x.extrelocatable, x.extversion, x.extconfig, x.extcondition FROM pg_extension x JOIN pg_namespace n ON n.oid = x.extnamespace
However, they provide a proprietary way to generate the CREATE TABLE statements:
# export ddl
psql -U root -h localhost -p 26257 -d tpcc -Ato ddl.sql -c "
show create all tables
"
I like this simple command, even if not compatible with PostgreSQL, it is a good workaround for the absence of pgdump --schema-only
.
I have extracted the DDL in a ddl.sql
file:
As you can see, the DDL generated is not PostgreSQL-compatible. I have been working with a lot of SQL databases and have never seen this. In SQL we create indexes with CREATE INDEX. Indexes can be created implicitly from CREATE TABLE to enforce unique constraints, but declaring an index with the logical attributes of a table (list of columns and constraints) is not common.
I've quickly written a small awk
script to get it back to PostgreSQL-compatible DDL:
awk '
# create extensions that are default in CR
NR==1{
print "create extension if not exists pgcrypto;" > "tab-"FILENAME
}
# remove all reference to schema public
/TABLE public[.]/{
$0=gensub(/(REFERENCES |TABLE )public[.]/,"\\1","g")
}
# move commas at the end of lines to the begining of next line
nextline!=""{
$0=gensub(/(^\t*)(.*)$/,"\\1"nextline"\\2",1)
nextline=""
}
/,$/{
sub(/,$/,"")
nextline=","
}
# INDEX clause in the CREATE TABLE is not a SQL syntax
/^CREATE TABLE/{
table=gensub(/^CREATE TABLE (.*) [(]/,"\\1",1)
}
/^\t*,(UNIQUE )?INDEX/{
indexes=indexes"\n"gensub(/ STORING /," INCLUDE ",1,gensub(/^\t*,(UNIQUE )?(INDEX)([^(]+)(.*)( STORING)?(.*)$/,"create \\1\\2 \\3 on "table" \\4 \\5 \\6;",1))
$0=gensub(/(^\t*),(.*)$/,"\\1--\\2",1)
}
# validate constraints at creation
/^ALTER TABLE.*ADD CONSTRAINT.*/{
$0=gensub(/(.*)(NOT VALID)?(;)$/,"\\1\\3",1)
{print > "ref-"FILENAME}
$0="--"$0
}
/^ALTER TABLE.*VALIDATE CONSTRAINT.*;$/{
$0="--"$0
}
# print that to the create table file
{print > "tab-"FILENAME}
END{
print indexes > "ind-"FILENAME
}
' ddl.sql
This does a few additional transformations:
- remove references to
public
schema to be able to create into a dedicated one - move the ending comma (
,
) to the beginning of next line to make it easier to comment out - comment the INDEX declaration and generate the CREATE INDEX statement to run after the import of data
- add create extension for the functions that CRDB use without declaration (
gen_random_uuid()
here) - I move the creation of referential integrity constraints into another script to run it at the end, instead of creating them immediately as NOT VALID. By the way, with COPY in YugabyteDB you can create them before and import DISABLE_FK_CHECK if you want to bypass the check
The result goes in a tab-ddl.sql
file for the table creation, ind-ddl.sql
for the secondary indexes and ref-ddl.sql
for the foreign keys.
This is easy to run on the YugabyteDB target, I'm defining search_path
to create those tables in a tpcc
schema (this is the reason I removed references to public
in the DDL script):
psql -v ON_ERROR_STOP=1 \
-c 'create schema tpcc' -c 'set search_path=tpcc,"$user", public' \
-ef tab-ddl.sql | tee tab-ddl.log
I'm using the PostgreSQL psql
here with failure on the first error because when you migrate data, you don't want any unseen surprise. Better stop than expecting you to read the log.
I can see the tables from the YugabyteDB web console on http://localhost:15433/
They are empty. Ready to import data.
DML: Migrate the data
Dump
CockroachDB has no tool to dump data in a format that is easy to import elsewhere. You may be surprised by that, but I'm not. Oracle was famous by providing a SQL*Loader
with no SQL*Unloader
(the rumors say that it was ready to be released when Larry Ellison decided to remove a tool that makes it too easy to move out of his major source of revenues).
The same with CockroachDB:
- In 2017, supporting
pg_dump
is not a priority and the solution iscockroach dump
: https://github.com/cockroachdb/cockroach/issues/20296 - In 2020, this alternative
cockroach dump
is deprecated: https://github.com/cockroachdb/cockroach/issues/54040 - Later, exporting to CSV seem to be the solution but there is no reliable tool provided like PostgreSQL COPY: https://github.com/cockroachdb/cockroach/issues/85571
UPDATE: COPY is now supported - I've added a \copy
method below
Export to CSV
In the issues mentionned above, the answers to the absence of dump mention an export to CSV with cockroach sql --format csv
. I tried it, but it just displays the result with commas to separate fields. It doesn't provide the quality needed for a migration. For example, I tried with the TPCC schema but encountered many errors in timestamp format or even worse: no distinction between a NULL value and the 'NULL' character string:
as you can see, " ("") ",NULL,NULL,0
is the CSV for (' (") ','NULL',null,0) where it should be " ("") ",NULL,,0
For a migration, you cannot rely on such a tool.
As a comparison, here is what PostgreSQL or YugabyteDB provides:
\copy to CSV
The PostgreSQL-compatible way to export to CSV with COPY is finally supported and it may be a preferred way
I've generated the export and import commands with a query on pg_tables
:
tpcc=> -- Export to CSV
tpcc=> select format('\copy %I to %L with csv',tablename,tablename||'.csv')
tpcc-> from pg_tables where schemaname='public';
format
-----------------------------------------------
\copy warehouse to 'warehouse.csv' with csv
\copy district to 'district.csv' with csv
\copy customer to 'customer.csv' with csv
\copy history to 'history.csv' with csv
\copy "order" to 'order.csv' with csv
\copy new_order to 'new_order.csv' with csv
\copy item to 'item.csv' with csv
\copy stock to 'stock.csv' with csv
\copy order_line to 'order_line.csv' with csv
(9 rows)
tpcc=> -- Import from CSV
tpcc=> select format('\copy %I from %L with csv',tablename,tablename||'.csv')
tpcc-> from pg_tables where schemaname='public';
format
-------------------------------------------------
\copy warehouse from 'warehouse.csv' with csv
\copy district from 'district.csv' with csv
\copy customer from 'customer.csv' with csv
\copy history from 'history.csv' with csv
\copy "order" from 'order.csv' with csv
\copy new_order from 'new_order.csv' with csv
\copy item from 'item.csv' with csv
\copy stock from 'stock.csv' with csv
\copy order_line from 'order_line.csv' with csv
(9 rows)
I've run those commands to export from CockroachDB and import to YugabyteDB and did not encounter the same problems as with the EXPORT command. Then, this is probably a better method than the next one, using Foreign Data Wrapper, which were the only reliable one in previous versions.
Foreign Data Wrapper
As YugabyteDB is PostgreSQL compatible, we can use the Foreign Data Wrapper. Of course it also depend on the compatibility of the remote database. I tried this in psql
:
psql <<'SQL'
CREATE EXTENSION postgres_fdw;
CREATE SERVER cr FOREIGN DATA WRAPPER postgres_fdw OPTIONS (
host 'cr', port '26257', dbname 'tpcc'
);
CREATE USER MAPPING FOR yugabyte SERVER cr OPTIONS (
user 'root', password ''
);
IMPORT FOREIGN SCHEMA public FROM SERVER cr INTO public;
\det
SQL
and this was looking good...
...except that nothing is there:
yugabyte=# \det
List of foreign tables
Schema | Table | Server
--------+-------+--------
(0 rows)
Unfortunately nothing was imported from the schema. This is not a surprise as CockroachDB is not compatible with PostgreSQL catalog views.
Let's try with a table, declaring the structure:
CREATE foreign TABLE warehouse (
w_id INT8 NOT NULL
,w_name VARCHAR(10) NOT NULL
,w_street_1 VARCHAR(20) NOT NULL
,w_street_2 VARCHAR(20) NOT NULL
,w_city VARCHAR(20) NOT NULL
,w_state CHAR(2) NOT NULL
,w_zip CHAR(9) NOT NULL
,w_tax DECIMAL(4,4) NOT NULL
,w_ytd DECIMAL(12,2) NOT NULL
) server cr;
this looks good:
CREATE FOREIGN TABLE
tpcc=# \det
List of foreign tables
Schema | Table | Server
--------+-----------+--------
public | warehouse | cr
(1 row)
tpcc=# select * from warehouse;
w_id | w_name | w_street_1 | w_street_2 | w_city | w_state | w_zip | w_tax | w_ytd
------+--------+------------+------------+--------+---------+-----------+--------+-----------
0 | 8 | 17 | 13 | 11 | SF | 640911111 | 0.0806 | 300000.00
(1 row)
tpcc=# insert into tpcc.warehouse select * from warehouse;
INSERT 0 1
tpcc=# select * from tpcc.warehouse;
w_id | w_name | w_street_1 | w_street_2 | w_city | w_state | w_zip | w_tax | w_ytd
------+--------+------------+------------+--------+---------+-----------+--------+-----------
0 | 8 | 17 | 13 | 11 | SF | 640911111 | 0.0806 | 300000.00
(1 row)
tpcc=# truncate table tpcc.warehouse;
TRUNCATE TABLE
I was able to read data from CockroachDB and insert it into my YugabyteDB. Great. I'll do that for all tables. I have removed what I imported because I'll automate this for all tables.
As in YugabyteDB we are like in PostgreSQL, with the same catalog views, I'm generating the CREATE FOREIGN TABLE and INSERT from it:
with
n as (
select oid as relnamespace
, nspname as sch
from pg_namespace
where nspname ='tpcc'
), c as (
select oid as attrelid
, relnamespace
, relname as tab
from pg_class
where relkind='r'
), a as (
select attrelid
, attname as col
, format('%I %s',attname,format_type(atttypid, atttypmod)) as def
from pg_attribute
where attnum>0 and not attisdropped
)
select
format(
'create foreign table %I (%s) server cr'
,tab
, string_agg(def,', ')
)
from n natural join c natural join a group by sch,tab
union all
select
format(
'insert into %I.%I(%s) select %s from %I'
,sch
,tab
,string_agg(col,',')
,string_agg(col,','),tab)
from n natural join c natural join a group by sch,tab
union all
select
format(
'drop foreign table %I'
,tab
)
from n natural join c
\gexec
This generates the statements reading the structure of the target tabels (in schema tpcc
) and executes the result with \gexec
:
My warehouse
foreign table was already created, the others are created here. Then rows are imported. And at the end I drop the foreign tables to be sure we don't use them anymore. The application can now work on the new tables in YugabyteDB after creating the secondary indexes and foreign keys:
psql -v ON_ERROR_STOP=1 \
-c 'set search_path=movr,"$user", public' \
-ef ind-ddl.sql | tee ind-ddl.log
psql -v ON_ERROR_STOP=1 \
-c 'set search_path=movr,"$user", public' \
-ef ref-ddl.sql | tee ref-ddl.log
All this is an example and can be customized. For large tables, you may want to pre-split them, but if not, YugabyteDB can auto-split when they grow. And if you prefer to go though CSV files, create the foreign tables only and use PostgreSQL COPY
to get a reliable CSV file. Here is how to generate the \copy
commands with the previous query WITH clause:
...
union all
select
format(
'\copy (select * from %I) to %L csv;'
,tab,tab||'.csv'
)
from n natural join c
;
Of course, your application must be down for this kind of migration. To reduce downtime (and stress) there are replication solutions like https://www.arcion.io which has all connectors.
If you are migrating from CockroachDB you may be positively surprised by all the possibilities offered by PostgreSQL-compatible ecosystem. All this was standard PostgreSQL queries, on pg_catalog
schema, with psql
and \gexec
. I didn't install any other tools here. And they are reliable because thousands of PostgreSQL users work with it daily on all kind of data, for years. YugabyteDB re-uses the same code for the SQL layer, changing only what is needed to scale-out with high performance.
Posted on December 16, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.