GBase 8s Sharded Tables and Index Sharding Strategies (1)

In the vast universe of databases, the efficiency of data storage and retrieval is a key performance metric. GBase 8s, with its unique sharding technology, provides robust support for efficient data management and quick access. This article will delve into the concept, role, and strategies of sharding in GBase 8s, exploring how sharding can enhance database performance and achieve optimized data storage.

1. Concept and Role of Sharding

Concept of Sharding

Sharding is a method of distributed data storage.
It applies to tables or indexes.
Data or indexes are distributed across different locations based on certain rules.

Role of Sharding

The sole goal is to enhance performance.
Fully utilize the engine's PDQ and parallel I/O.
Reduce the amount of data accessed by ignoring shards.
Efficiently handle large tables.
Facilitate the management of large tables.

2. Concept of Shard Ignoring

Shard ignoring refers to accessing only the table and index shards that meet the query conditions in an SQL statement, thus avoiding full table scans. This is effective for expression-based sharding and applies to both table and index shards.

Example of Shard Ignoring in GBase 8s:

CREATE TABLE tab0 (col1 integer, col2 date)
FRAGMENT BY expression
col1 >= 0 and col1 < 100 in datadbs1,
col1 >= 100 and col1 < 200 in datadbs2,
col1 >= 200 in datadbs3;

Execution Plan:

set explain on;
select * from tab0 where col1 between 105 and 190;  #  Accesses only datadbs2

Output Fragment:

 1) gbasedbt.tab0: SEQUENTIAL SCAN  (Serial, fragments: 1)
    Fragments Scanned: (1) datadbs2
    Filters: (gbasedbt.tab0.col1 <= 190 AND gbasedbt.tab0.col1 >= 105 )

3. Introduction to DATASKIP

DATASKIP allows you to specify which database spaces to skip if they are unavailable due to disk failures. It improves data availability and can be set to ON, OFF, or ALL. When set to ON, you must specify at least one database space. Setting to ALL skips all unavailable shards. The default setting is OFF.

Example:

CREATE TABLE tab1 (col1 int, col2 varchar(10))
FRAGMENT BY EXPRESSION
col1 >= 0 and col1 < 100 in datadbs1,
col1 >= 100 and col1 < 200 in datadbs2,
col1 >= 200 and col1 < 300 in datadbsbd;
insert into tab1 values(50,'a');
insert into tab1 values(60,'b');
insert into tab1 values(101,'c');
insert into tab1 values(290,'d');
insert into tab1 values(280,'e');

Manually corrupt a shard:

mv datadbsbd datadbsbd1

Set DATASKIP:

SET DATASKIP ON datadbsbd;

Restart the instance, the startup process is summarized below:

Opening primary chunks...oninit: Cannot open chunk '/home/xfj/aee350/storage/datadbsbd'. errno = 2
succeeded
Validating chunks...succeeded
Initialize Async Log Flusher...succeeded
......
Starting scheduling system...succeeded
Verbose output complete: mode = 5

--Query data in the original datadbsbd in the database. The query data is empty and no error is reported.
> select * from tab1 where col1 >= 200 and col1 < 300;

       col1 col2

No rows found.
---If do not set DATASKIP or set DATASKIP to off, an error will be reported when querying data in the original datadbsbd.
> select * from tab1 where col1 >= 200 and col1 < 300;

       col1 col2

  243: Could not position within a table (root.tab1).

  155: ISAM error: Primary and Mirror chunks are bad
Error in line 1
Near character position 50

4. Sharding Strategies and Index Sharding in GBase 8s

Tables created with the FRAGMENT BY clause specify their storage distribution schemes. GBase 8s supports two major sharding methods:

Round-Robin Sharding: Distributes data evenly using an internal hash algorithm.
Expression-Based Sharding: Distributes data based on user-defined expressions.

Indexes in GBase 8s can be:

Attached Indexes: Default behavior, where indexes are stored in the same dbspace as the data.
Detached Indexes: Indexes with a separate sharding strategy.

4.1 Introduction to Sharded Tables in GBase Mode

Round-robin Sharding

Round-robin sharding uses the internal hash algorithm provided by GBase 8s to evenly distribute data, without requiring knowledge of the data shard layout.
This method can only be applied to tables, not to indexes, as it would degrade system performance.
Round-robin sharding cannot leverage the shard table's data skipping feature to reduce disk scans during queries. However, for applications that need to scan most of the data, using the round-robin sharding method can enable parallel scanning with PDQ, thereby improving query performance.
One drawback of round-robin sharding is that during a query, all data shards must be scanned. Therefore, the round-robin scheme does not support DATASKIP. If DATASKIP is not supported and an error occurs in one data shard, the entire query will fail because it cannot be determined whether the data shard with the error contains any records that meet the query conditions.
In summary, the round-robin scheme is suitable for scenarios where users need to load data quickly, do not know the data access patterns in advance, frequently update data, or are unaware of the data distribution method.

Below is an example of a round-robin sharded table and the corresponding index shards:

database testdb;

CREATE TABLE frag_rb_tab (
    sale_time datetime year to second,
    product_id int,
    product_time datetime year to second,
    price float,
    sale_amount int
) FRAGMENT BY ROUND ROBIN IN
    datadbs1, datadbs2, datadbs3;

-- Create two indexes, one regular index and one unique index (the index location must be explicitly specified)
CREATE INDEX idx1_frag_rb_tab ON frag_rb_tab(product_time, product_id);
CREATE UNIQUE INDEX idx2_frag_rb_tab ON frag_rb_tab(sale_time, product_id) IN datadbs3;

-- Where are the indexes idx1_frag_rb_tab and idx2_frag_rb_tab stored? This can be viewed using oncheck.
!oncheck -ci testdb:frag_rb_tab
Validating indexes for testdb:root.frag_rb_tab...

Index idx1_frag_rb_tab
    Index fragment partition datadbs1 in DBspace datadbs1
    Index fragment partition datadbs2 in DBspace datadbs2
    Index fragment partition datadbs3 in DBspace datadbs3

Index idx2_frag_rb_tab
    Index fragment partition datadbs3 in DBspace datadbs3

It can be seen that the index idx1_frag_rb_tab is an attached index. By default, GBase8s creates non-unique indexes for round-robin sharded tables as attached indexes, while the unique index idx2_frag_rb_tab is a detached index.

Note: When creating a unique index on a round-robin sharded table, the index location must be explicitly specified (as in the definition above with IN datadbs3). In other words, only unique detached indexes can be created for round-robin sharding, otherwise an "872" error will be reported.

DROP INDEX idx2_frag_rb_tab;
CREATE UNIQUE INDEX idx2_frag_rb_tab ON frag_rb_tab(sale_time, product_id);
872: Invalid fragment strategy or expression for the unique index.

If a primary key is created on a round-robin sharded table, how does GBase8s handle it?

ALTER TABLE frag_rb_tab ADD CONSTRAINT PRIMARY KEY (product_id) CONSTRAINT pk1_frag_rb_tab;
!oncheck -ci testdb:frag_rb_tab
Validating indexes for testdb:root.frag_rb_tab...

Index idx1_frag_rb_tab
    Index fragment partition datadbs1 in DBspace datadbs1
    Index fragment partition datadbs2 in DBspace datadbs2
    Index fragment partition datadbs3 in DBspace datadbs3

Index 104_2
    Index fragment partition rootdbs in DBspace rootdbs

It can be seen that by default, GBase8s automatically creates a detached index for the primary key on a sharded table.

Expression-based Sharding

Expression-based sharding is the most commonly used method in actual database design.
It requires the user to have a certain understanding of the data distribution and store data according to user-defined expressions in predetermined tablespaces.
The data distribution method is entirely specified by the user based on the characteristics and access patterns of the data.
Expression-based sharding can be constructed based on one or multiple columns, providing possibilities for data skipping and performance improvement.
It can be used for both tables and indexes.
The expression-based sharding scheme can include relational operators such as >, <, >=, <=, = and logical operators like AND.

Below is an example of an expression-based sharded table:

database testdb;

CREATE TABLE frag_exp_tab (
    sale_time datetime year to second,
    product_id int,
    product_time datetime year to second,
    price float,
    sale_amount int
) FRAGMENT BY EXPRESSION
    sale_time < '2012-01-01 00:00:00' AND sale_time >= '2011-01-01 00:00:00' IN datadbs1,
    sale_time < '2011-01-01 00:00:00' AND sale_time >= '2010-01-01 00:00:00' IN datadbs2,
    sale_time < '2010-01-01 00:00:00' AND sale_time >= '2009-01-01 00:00:00' IN datadbs3;

CREATE UNIQUE INDEX idx1_frag_exp_tab ON frag_exp_tab(sale_time, product_id);
CREATE INDEX idx2_frag_exp_tab ON frag_exp_tab(product_time, product_id);

-- Where are the indexes idx1_frag_exp_tab and idx2_frag_exp_tab stored? This can be viewed using oncheck.
!oncheck -ci testdb:frag_exp_tab
Validating indexes for testdb:root.frag_exp_tab...

Index idx1_frag_exp_tab
    Index fragment partition datadbs1 in DBspace datadbs1
    Index fragment partition datadbs2 in DBspace datadbs2
    Index fragment partition datadbs3 in DBspace datadbs3

Index idx2_frag_exp_tab
    Index fragment partition datadbs1 in DBspace datadbs1
    Index fragment partition datadbs2 in DBspace datadbs2
    Index fragment partition datadbs3 in DBspace datadbs3

It can be seen that the indexes idx1_frag_exp_tab and idx2_frag_exp_tab are both attached indexes by default.

Note: For sharded tables (except round-robin sharded tables), to create a unique attached index, the shard key field must be included in the index, otherwise an "872" error will be reported.

DROP INDEX idx1_frag_exp_tab;
CREATE UNIQUE INDEX idx1_frag_exp_tab ON frag_exp_tab(product_id);
872: Invalid fragment strategy or expression for the unique index.

The order of fragment expressions is crucial since the row data is stored based on the first matching expression. You can use a remainder fragment to store data that doesn't match any of the fragment expressions. If a remainder fragment already exists, you can't add it again. If a remainder fragment is present and a new fragment is added, the database server will scan and recalculate all records in the remainder fragment; some records might move to the new fragment. The remainder fragment must be the last part of the fragment expression; otherwise, it will raise a syntax error: 871: Remainder fragment must be specified last. Here is an example:

CREATE TABLE tab1 (col1 int, col2 varchar(10))
FRAGMENT BY EXPRESSION
  col1 >= 0 and col1 < 100 in datadbs1,
  col1 >= 100 and col1 < 200 in datadbs2,
  remainder in datadbs3;

INSERT INTO tab1 VALUES(50,'a');
INSERT INTO tab1 VALUES(60,'b');
INSERT INTO tab1 VALUES(101,'c');
INSERT INTO tab1 VALUES(301,'d');

Additionally, you can define custom partition names as shown below:

CREATE TABLE tab2 (col1 int, col2 varchar(10))
FRAGMENT BY EXPRESSION
  partition p1(col1 >= 0 and col1 < 100) in datadbs1,
  partition p2(col1 >= 100 and col1 < 200) in datadbs2,
  partition p3 remainder in datadbs3;

Principles of Expression-Based Fragmentation:

First Match Wins: Data is stored based on the first matching expression. Use a REMAINDER fragment at the end to store records that don't match any previous expressions.
Continuous Non-Overlapping Fragments: Create continuous fragments without overlapping areas to help with fragment elimination.
Simplify Expressions: Keep expressions as simple as possible to reduce CPU load.
Distribute Data Across Devices: Allocate data to different devices. For queries accessing small data areas frequently, fragment the frequently accessed data across multiple devices.
Order of Expressions: Place the most restrictive part of an expression first. Put high-selectivity expressions before low-selectivity ones, and frequently used expressions at the beginning.
Adjust Expressions: Modify expressions to reduce computation and data access per query, thereby reducing CPU load. Check execution plans and adjust as needed.

Examples of Other Expression-Based Fragmentation Methods in GBase8s

MOD in Expression

The MOD function divides id_num by 3 and stores data based on the remainder.

DATABASE testdb;

CREATE TABLE employee (
  id_num integer,
  name char(50),
  salary integer
)
FRAGMENT BY EXPRESSION
  MOD(id_num, 3) = 0 IN datadbs1,
  MOD(id_num, 3) = 1 IN datadbs2,
  MOD(id_num, 3) = 2 IN datadbs3;

CREATE INDEX idx_employee ON employee(id_num);

-- Check execution plan to verify fragmentation
SET EXPLAIN ON;
SELECT * FROM employee WHERE id_num = 2;

---The following is an output fragment of the execution plan
select * from employee where id_num=2
Estimated Cost: 1
Estimated # of Rows Returned: 1
1) root.employee: INDEX PATH
(1) Index Name: root.idx_empoyee
Index Keys: id_num (Serial, fragments: 2)
Fragments Scanned: (2) datadbs3c
Lower Index Filter: root.employee.id_num  = 2

List Fragmentation

List fragmentation improves the OR and IN operations with more flexibility and efficiency.

CREATE TABLE customer (
  cust_id integer,
  name varchar(128),
  street varchar(255),
  state char(2),
  zipcode char(5),
  phone char(15)
)
FRAGMENT BY LIST(state)
  PARTITION p0 VALUES('RS', 'IL') IN datadbs1,
  PARTITION p1 VALUES('CA', 'OR') IN datadbs2,
  PARTITION p2 VALUES(NULL) IN datadbs3;

Interval Fragmentation

Interval fragmentation is useful for time-based fragmentation with changing time ranges. It automatically extends fragments based on inserted records.

-- Create interval-based fragmentation with a 10-day interval across 3 dbspaces.
CREATE TABLE sales (
  amount int,
  id int,
  data_time datetime year to second
)
FRAGMENT BY RANGE(data_time) INTERVAL(10 UNITS DAY)
  STORE IN (datadbs1, datadbs2, datadbs3)
  PARTITION p_sales0 VALUES < '2011-01-01 00:00:00' IN datadbs4;

CREATE UNIQUE INDEX idx_sales ON sales(data_time, id);

-- Insert test data
INSERT INTO sales VALUES(100, 1, '2011-01-01 00:00:00');
INSERT INTO sales VALUES(100, 1, '2011-01-02 00:00:00');
INSERT INTO sales VALUES(102, 2, '2011-01-11 00:00:00');
INSERT INTO sales VALUES(100, 1, '2011-01-11 01:00:00');
INSERT INTO sales VALUES(100, 2, '2011-01-21 01:00:00');
INSERT INTO sales VALUES(100, 1, '2011-01-21 02:00:00');
INSERT INTO sales VALUES(100, 2, '2011-02-01 01:00:00');
INSERT INTO sales VALUES(100, 1, '2011-02-01 02:00:00');
INSERT INTO sales VALUES(100, 2, '2011-02-11 01:00:00');
INSERT INTO sales VALUES(100, 1, '2011-02-11 02:00:00');
INSERT INTO sales VALUES(100, 2, '2011-02-21 01:00:00');
INSERT INTO sales VALUES(100, 1, '2011-02-21 02:00:00');
INSERT INTO sales VALUES(100, 2, '2011-03-01 01:00:00');
INSERT INTO sales VALUES(100, 1, '2011-03-01 02:00:00');
INSERT INTO sales VALUES(100, 2, '2011-03-11 01:00:00');
INSERT INTO sales VALUES(100, 1, '2011-03-11 02:00:00');

SELECT COUNT(*) FROM sales;

-- Check auto-extended fragments
!oncheck -pt testdb:sales | grep DB
Table fragment partition p_sales0 in DBspace datadbs4
Table fragment partition sys_p1 in DBspace datadbs2
Table fragment partition sys_p2 in DBspace datadbs3
Table fragment partition sys_p3 in DBspace datadbs1
Table fragment partition sys_p4 in DBspace datadbs2
Table fragment partition sys_p5 in DBspace datadbs3
Table fragment partition sys_p6 in DBspace datadbs1
Table fragment partition sys_p7 in DBspace datadbs2
Index idx_sales fragment partition p_sales0 in DBspace datadbs4
Index idx_sales fragment partition sys_p1 in DBspace datadbs2
Index idx_sales fragment partition sys_p2 in DBspace datadbs3
Index idx_sales fragment partition sys_p3 in DBspace datadbs1
Index idx_sales fragment partition sys_p4 in DBspace datadbs2
Index idx_sales fragment partition sys_p5 in DBspace datadbs3
Index idx_sales fragment partition sys_p6 in DBspace datadbs1
Index idx_sales fragment partition sys_p7 in DBspace datadbs2
--Can also query shard details using system tables in the database where the tables reside :sysfragments:
select t.tabname,f.dbspace,f.partition,f.exprtext
from systables t, sysfragments f
where t.tabid=f.tabid and f.fragtype='T'
and t.tabname='sales';

4.2 Index Fragmentation in GBase Mode

From the examples of round-robin fragmentation and expression-based fragmentation above, we can see that when an index does not have a FRAGMENT BY clause, it inherits the table's fragmentation (Attach or Detach).

By default, GBase8s creates indexes for fragmented tables (round-robin, expression-based) as attached indexes.
By default, for fragmented tables, a primary key automatically creates a unique detached index.

Foreign keys and unique constraints also automatically create a unique detached index by default. Indexes can also specify the FRAGMENT BY clause to create either attached or detached indexes.

Attached Index Example

Attached means that the data and the index are stored in the same dbspace or follow the same fragmentation strategy.

Example 1: Index Fragmentation Handling (Stored in the same dbspace, but with a different fragmentation strategy than the table)

CREATE TABLE tab1 (col1 int, col2 date)
FRAGMENT BY round robin in datadbs1, datadbs2;
CREATE INDEX idx1 on tab1 (col1)
FRAGMENT BY EXPRESSION
col1 >= 0 and col1 < 100 in datadbs1,
col1 >= 100 and col1 < 200 in datadbs2;

Example 2: Index Fragmentation Handling (Same fragmentation strategy as the table but different storage locations)

CREATE TABLE tab2 (col1 int, col2 date)
FRAGMENT BY EXPRESSION
col1 >= 0 and col1 < 100 in datadbs1,
col1 >= 100 and col1 < 200 in datadbs2;
CREATE INDEX idx2 on tab2 (col1)
FRAGMENT BY EXPRESSION
col1 >= 0 and col1 < 100 in datadbs3,
col1 >= 100 and col1 < 200 in datadbs4;

Example 3: Index Fragmentation Handling (Specifying FRAGMENT BY clause to create a unique attached index)

It is possible to create a unique attached index on an expression-based fragmented table that includes the fragmentation key field (round-robin can only create a unique detached index).

CREATE TABLE tab3 (col1 int, col2 int)
FRAGMENT BY EXPRESSION
col1 >= 0 and col1 < 100 in datadbs1,
col1 >= 100 and col1 < 200 in datadbs2;
CREATE unique INDEX idx3 on tab3 (col1)
FRAGMENT BY EXPRESSION
col1 >= 0 and col1 < 100 in datadbs3,
col1 >= 100 and col1 < 200 in datadbs4;

Note: To create a unique attached index on a fragmented table, the index must include the fragmentation key field; otherwise, an "872" error will be reported.

DROP INDEX idx3;
CREATE unique INDEX idx3 on tab3 (col2)
FRAGMENT BY EXPRESSION
col1 >= 0 and col1 < 100 in datadbs3,
col1 >= 100 and col1 < 200 in datadbs4;
872: Invalid fragment strategy or expression for the unique index.

Detached Index Example

A detached index is an index with an independent fragmentation strategy. To create a detached index, you can use the CREATE INDEX statement and define the fragmentation strategy using the FRAGMENT BY EXPRESSION clause.

Example 1: Creating a Detached Index with Different Expressions and Storage Spaces

CREATE TABLE tab4 (col1 integer, col2 date)
FRAGMENT BY EXPRESSION
col1 >= 0 and col1 < 100 in datadbs1,
col1 >= 100 and col1 < 200 in datadbs2;
-- Creating a detached index with different expressions and storage spaces
CREATE INDEX idx4 on tab4 (col1)
FRAGMENT BY EXPRESSION
col1 >= 0 and col1 < 200 in datadbs3,
col1 >= 200 in datadbs4;
!oncheck -pt testdb:tab4 | grep DB
Table fragment partition datadbs1 in DBspace datadbs1
Table fragment partition datadbs2 in DBspace datadbs2
Index idx4 fragment partition datadbs3 in DBspace datadbs3
Index idx4 fragment partition datadbs4 in DBspace datadbs4;

Example 2: Creating a Unique Detached Index

CREATE TABLE tab5 (col1 integer, col2 date)
FRAGMENT BY EXPRESSION
col1 >= 0 and col1 < 100 in datadbs1,
col1 >= 100 and col1 < 200 in datadbs2;
-- Creating a unique detached index with different expressions and storage spaces
CREATE unique INDEX idx5 on tab5 (col1)
FRAGMENT BY EXPRESSION
col1 >= 0 and col1 < 200 in datadbs3,
col1 >= 200 in datadbs4;
!oncheck -pt testdb:tab4 | grep DB
Table fragment partition datadbs1 in DBspace datadbs1
Table fragment partition datadbs2 in DBspace datadbs2
Index idx4 fragment partition datadbs3 in DBspace datadbs3
Index idx4 fragment partition datadbs4 in DBspace datadbs4;

Through this in-depth analysis, we have not only unveiled the mysteries of GBase 8s fragmentation technology but also provided practical strategies and techniques for database administrators and developers.

Blog