GBase 8s Database Data Compression Strategies

congcong

Cong Li

Posted on August 27, 2024

GBase 8s Database Data Compression Strategies

This article provides a detailed introduction to the data compression feature in GBase databases, covering the benefits, methods, commands, and limitations of data compression. GBase 8s data compression is used in scenarios where storage space is limited, there's a need to reduce data storage footprint, and data processing efficiency must be improved. It is recommended to compress tables and shards that are not frequently updated, offering an effective solution for saving storage space and enhancing I/O efficiency.

1. Benefits of Data Compression

1) Saving Storage Space and Associated Costs

2) Improving I/O Efficiency

  • The database reads compressed data, reducing I/O load.
  • The time saved on I/O operations outweighs the time required to decompress data in memory.

3) Enhancing Read Operation Performance

4) Slight Decrease in Write Operation Performance

2. Data Compression Methods

1) Automatic Compression for New Tables

a. Automatic Compression:

CREATE TABLE cust5 (...) COMPRESSED;
Enter fullscreen mode Exit fullscreen mode
  • Automatically generates a data dictionary.
  • Automatically compresses data.

b. Light Append Mode:

Light append mode bypasses the buffer pool for faster data loading. It is automatically enabled under the following conditions:

  • Row table.
  • No index.

2) Disabling Automatic Compression:

TASK/ADMIN(TABLE UNCOMPRESS);
Enter fullscreen mode Exit fullscreen mode

3) Re-enabling Compression:

TASK/ADMIN(TABLE COMPRESS);
Enter fullscreen mode Exit fullscreen mode

4) Controlling Fragment Decompression:

TASK/ADMIN(FRAGMENT COMPRESS/UNCOMPRESS);
Enter fullscreen mode Exit fullscreen mode

2. Compression for Existing Tables

1) Enabling Data Compression:

EXECUTE FUNCTION TASK('enable compression');
Enter fullscreen mode Exit fullscreen mode

2) Generating Data Dictionary Compression:

EXECUTE FUNCTION TASK('table create_dictionary', 'stdtab', 'test', 'gbasedbt');
Enter fullscreen mode Exit fullscreen mode

3) Consolidating Data Space: Merges unused space into the tail of the table or fragment.

EXECUTE FUNCTION TASK('table compress repack', 'stdtab', 'test', 'gbasedbt');
Enter fullscreen mode Exit fullscreen mode

4) Returning Unused Space to the Database Space:

EXECUTE FUNCTION TASK('table shrink', 'stdtab', 'test', 'gbasedbt');
EXECUTE FUNCTION TASK('fragment shrink', 'partn');
Enter fullscreen mode Exit fullscreen mode

3. Validating Data Compression Commands

1) Viewing the Data Dictionary with onstat -g ppd after Data Compression:

Example command:

EXECUTE FUNCTION TASK('table create_dictionary', 'table', 'db', 'gbasedbt');
Enter fullscreen mode Exit fullscreen mode

Example output:

onstat -g ppd
partnum  ColOffset  DbsNum  CrTS  CrLogID  CrLogPos  DrTS  DrLogID  DrLogPos
0x1001d5  -1        1       1393371661  4       16339024  0    0       0
0x1001d5  4         1       1393371661  4       16355408  0    0       0
Enter fullscreen mode Exit fullscreen mode
  • partnum: Partition number to which the compression dictionary is applied.
  • ColOffset: Byte offset of the BLOB column in a compressed partition; -1 if only rows are compressed.
  • DbsNum: Number of database spaces residing in the dictionary.
  • CrTS: Timestamp when the data dictionary was created.
  • CrLogID: Unique ID of the logical log created when the data dictionary was generated.
  • CrLogPos: Position of the logical log when creating the data dictionary.
  • DrTS: Timestamp when the data dictionary was cleared.
  • DrLogID: Unique ID of the logical log when the data dictionary was cleared.
  • DrLogPos: Position of the logical log when the data dictionary was cleared.

2) Estimating Compression Ratio:

EXECUTE FUNCTION TASK("table estimate_compression", "tablename", "dbname", "owner_name");
Enter fullscreen mode Exit fullscreen mode

Example output:

est   curr     change  partnum   coloff   table
----- -----    ------  ---------- ------- --------------------
75.7% 75.3%    +0.4    0x00200003  -1      insurance:bwilson.auto
75.7% 0.0%     +75.7   0x00300002  -1      insurance:pchang.home
Enter fullscreen mode Exit fullscreen mode
  • est: Existing compression ratio.
  • curr: Current compression ratio.
  • change: Change in compression from the current state.
  • partnum: Partition number.
  • coloff: Indicates if the data in the space is row data or blobs data.
  • table: Table name, formatted as database:owner.tablename.

3) Displaying Ongoing Compression Operations:

onstat –g dsk
                     Processed       Remaining  Duration 
Partnum   OP Pass   Rows     Blobs  Rows      Time(s)    Table Name
400002  Repack  1   6325     1752   1497      00:00:00    db:sl:t1
Enter fullscreen mode Exit fullscreen mode
  • partnum: Partition number.
  • OP: Compression operation (e.g., compress, repack, or shrink).
  • Pass: Relevant for repack operations, 1 indicates the first pass, 2 indicates the second pass.
  • Processed Rows: Number of rows processed by the compression operation.
  • Blobs: Number of blobs processed.
  • Remaining Rows: Number of rows not yet compressed.
  • Duration Time(s): Duration of the operation in seconds.
  • Table Name: Name of the table.

4. Data Compression Limitations

Compression cannot be applied in the following cases:

  • Data not stored in rows (e.g., blobs).
  • Indexes.
  • Temporary tables.
  • Catalog tables.
  • Tables in the system library.

GBase 8s data compression technology is a crucial tool for database administrators and developers to optimize storage and enhance performance. This detailed introduction aims to help you effectively utilize GBase 8s compression features. Thank you for reading!

💖 💪 🙅 🚩
congcong
Cong Li

Posted on August 27, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related