YugabyteDB cloud/region/zone
Franck Pachot
Posted on November 1, 2022
When you start YugabyteDB nodes (yb-master
and yb-tserver
) you tell them where they are in term of cloud, region and zone. Those are just names for the common three levels of global infrastructure, it can map to your on-premises data center, failure zone and rack. The main point is that the database will use this information to place data (the tablet peers, which are the table and index shard replicas) to maximize availability and performance.
- To be resilient to cloud provider, region or zone failure, the tablet peers are spread across them
- To fulfill performance expectations or data governance rules, the tablets can be constrained to a specific subset of the cluster (with tablespaces)
When connected to YSQL, the PostgreSQL endpoint, you can list all nodes with yb_servers()
which displays their cloud, region and zone. When querying a partitioned table, you can call yb_is_local_table()
to filter rows from the cloud, region or zone you are connected to. I've posted an example here. You can also know which node you are connected to by looking at its IP address with inet_server_addr()
or the interface address it is listening to with current_setting('listen_addresses')
.
If you simply want to know the cloud, region and zone you are connected to, YugabyteDB provides three simple functions: yb_server_cloud()
, yb_server_region()
, yb_server_zone()
Here is an example where I've not set any placement info:
docker run --rm yugabytedb/yugabyte:2.14.4.0-b26 bash -c "
yugabyted start --tserver_flags=''
until ysqlsh -c 'show server_version' ; do sleep 1 ; done 2>/dev/null
ysqlsh -e <<SQL
select current_setting('listen_addresses'),inet_server_addr();
select * from yb_servers();
select yb_server_cloud(), yb_server_region(), yb_server_zone();
SQL
"
The default as seen in yb_server()
or the web console are 'cloud1', 'datacenter1', 'rack1' but those new functions will just tell you that no placement info has been set:
Starting yugabyted...
server_version
---------------------
11.2-YB-2.14.4.0-b0
(1 row)
select current_setting('listen_addresses'),inet_server_addr();
current_setting | inet_server_addr
-----------------+------------------
0.0.0.0 | 127.0.0.1
(1 row)
select * from yb_servers();
host | port | num_connections | node_type | cloud | region | zone | public_ip
--------------+------+-----------------+-----------+--------+-------------+-------+-----------
7d79fe13d79b | 5433 | 0 | primary | cloud1 | datacenter1 | rack1 | 127.0.0.1
(1 row)
select yb_server_cloud(), yb_server_region(), yb_server_zone();
NOTICE: No cloud was set in placement_info setting at node startup.
NOTICE: No region was set in placement_info setting at node startup.
NOTICE: No zone was set in placement_info setting at node startup.
yb_server_cloud | yb_server_region | yb_server_zone
-----------------+------------------+----------------
| |
(1 row)
Note that the NOTICE may be annoying and will be removed: https://github.com/yugabyte/yugabyte-db/issues/14771
If I set explicitely the same name as the default, 'cloud1', with placement_cloud
in yugabyted --tserver_flags
:
docker run --rm yugabytedb/yugabyte:2.14.4.0-b26 bash -c "
yugabyted start --tserver_flags='placement_cloud=cloud1'
until ysqlsh -c 'show server_version' ; do sleep 1 ; done 2>/dev/null
ysqlsh -e <<SQL
select current_setting('listen_addresses'),inet_server_addr();
select * from yb_servers();
select yb_server_cloud(), yb_server_region(), yb_server_zone();
SQL
"
The name is displayed by yb_server_cloud()
:
Starting yugabyted...
server_version
---------------------
11.2-YB-2.14.4.0-b0
(1 row)
select current_setting('listen_addresses'),inet_server_addr();
current_setting | inet_server_addr
-----------------+------------------
0.0.0.0 | 127.0.0.1
(1 row)
select * from yb_servers();
host | port | num_connections | node_type | cloud | region | zone | public_ip
--------------+------+-----------------+-----------+--------+-------------+-------+-----------
ab439408352e | 5433 | 0 | primary | cloud1 | datacenter1 | rack1 | 127.0.0.1
(1 row)
select yb_server_cloud(), yb_server_region(), yb_server_zone();
yb_server_cloud | yb_server_region | yb_server_zone
-----------------+------------------+----------------
cloud1 | |
(1 row)
NOTICE: No region was set in placement_info setting at node startup.
NOTICE: No zone was set in placement_info setting at node startup.
In a real deployement, you set the three of them, like this for AWS eu-west1a
zone in eu-west1
region of aws
cloud:
docker run --rm yugabytedb/yugabyte:2.14.4.0-b26 bash -c "
yugabyted start --tserver_flags='placement_cloud=aws,placement_region=eu-west1,placement_zone=eu-west1a'
until ysqlsh -c 'show server_version' ; do sleep 1 ; done 2>/dev/null
ysqlsh -e <<SQL
select current_setting('listen_addresses'),inet_server_addr();
select * from yb_servers();
select yb_server_cloud(), yb_server_region(), yb_server_zone();
SQL
"
The three functions will tell you in which cloud, region and zone you are connected to:
Starting yugabyted...
server_version
---------------------
11.2-YB-2.14.4.0-b0
(1 row)
select current_setting('listen_addresses'),inet_server_addr();
current_setting | inet_server_addr
-----------------+------------------
0.0.0.0 | 127.0.0.1
(1 row)
select * from yb_servers();
host | port | num_connections | node_type | cloud | region | zone | public_ip
--------------+------+-----------------+-----------+-------+----------+-----------+-----------
a482bc2a786a | 5433 | 0 | primary | aws | eu-west1 | eu-west1a | 127.0.0.1
(1 row)
select yb_server_cloud(), yb_server_region(), yb_server_zone();
yb_server_cloud | yb_server_region | yb_server_zone
-----------------+------------------+----------------
aws | eu-west1 | eu-west1a
When using a load balancer, or the YugabyteDB smart drivers, one client connection string can go to any node and using those functions is the right way to know where you are. This is used internally by YugabyteDB query planner when there is a choice, like with duplicate covering indexes, to reduce the latency. In short, the node you are connected to is the PostgreSQL endpoint that processes your query (query planner and executor) and reads or writes to the database storage layer (local or remote nodes). Distributing connections help to scale out the compute (work memory and processing) in the SQL layer, in addition to scaling the storage layer.
Posted on November 1, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.