Enabling and Querying Slow Logs in GBase 8c

GBase 8c allows you to locate issues and diagnose problems using slow logs. Here is how to configure and use slow logs in GBase 8c:

1. Slow Log Configuration

(1) Relevant GUC Parameters

The main configuration parameters for GBase 8c slow logs are:

enable_stmt_track
- on: Default value, enables capture of Full/Slow SQL.
- off: Disables capture of Full/Slow SQL.
track_stmt_stat_level
- Composite parameter in the form of 'full sql stat level, slow sql stat level'.
- The first part is the full SQL tracking level, with values: OFF, L0, L1, L2.
- The second part is the slow SQL tracking level, with values: OFF, L0, L1, L2.
- Recommended value: OFF,L0.
log_min_duration_statement
- Time threshold: when a statement's execution time exceeds this value, its execution information is logged.
- Value range: -1 to 2147483647 milliseconds. Default value: 30min.
instr_unique_sql_count
- When the number of unique SQL entries generated in the system exceeds instr_unique_sql_count, automatic cleanup occurs or logging stops.
- Recommended value: 200000.
track_stmt_details_size
- Maximum size of a single statement's collected SQL, default value: 4096 byte.
track_stmt_retention_time
- Composite parameter controlling the retention time for full/slow SQL records, read every 60 seconds.
- Format: 'full sql retention time, slow sql retention time'.
- Full SQL retention time range: 0 to 86400.
- Slow SQL retention time range: 0 to 604800.
- Default value: 3600,604800.

(2) Method to Configure Slow Logs

First, log in as the database installation user:

su - gbase

Then execute:

gs_guc reload -Z coordinator -N all -I all -c "enable_stmt_track = ON"
gs_guc reload -Z coordinator -N all -I all -c "track_stmt_stat_level = 'OFF,L0'"
gs_guc reload -Z coordinator -N all -I all -c "log_min_duration_statement = 1000"
gs_guc reload -Z coordinator -N all -I all -c "instr_unique_sql_count = 200000"
gs_guc reload -Z coordinator -N all -I all -c "track_stmt_retention_time = '3600,10800'"

2. Querying Slow Logs

Log in to the database to query slow logs and full logs within the GBase 8c database. Note that queries need to be performed in the postgres system library.

Slow Log Query Views:

dbe_perf.get_global_slow_sql_by_timestamp(start_timestamp timestamp with time zone, end_timestamp timestamp with time zone);

Slow Log Query:

select * from dbe_perf.get_global_slow_sql_by_timestamp(start_timestamp, end_timestamp);

Full Log Query Views:

dbe_perf.get_global_full_sql_by_timestamp(start_timestamp timestamp with time zone, end_timestamp timestamp with time zone);

Full Log Query:

select * from dbe_perf.get_global_full_sql_by_timestamp(start_timestamp, end_timestamp);

3. Common SQL Optimization Techniques in GBase 8c

GUC Parameter Optimization

Optimize execution plans by setting GUC parameters.

Table Definition Optimization

Choose appropriate table definitions based on business needs, including storage models, table distribution keys, partition tables, suitable data types, and indexes.

Table definition optimization includes:

Storage Mode: GBase8c supports row storage, column storage, and in-memory storage. Choose the appropriate mode based on the business scenario.
Distribution Key: In a distributed structure, data is split according to the specified distribution key (column). If data needed for a join or operation is on different nodes, redistribution may be required, reducing efficiency. Keeping related data on the same node reduces data interaction during SQL execution, improving efficiency.
Field Types and Lengths: Design appropriate field types and lengths; create suitable indexes; use partition tables where necessary.

Example: Efficiency difference due to different distribution keys:

-- When join field is a distribution key, the execution is pushed down to DN nodes for higher efficiency.
postgres=# EXPLAIN SELECT * FROM td1,td2 WHERE td1.a=td2.c ORDER BY a;
                                 QUERY PLAN
------------------------------------------------------------------------------
Remote Subquery Scan on all (dn1,dn2,dn3)  (cost=2.04..2.05 rows=1 width=16)
  ->  Sort  (cost=2.04..2.05 rows=1 width=16)
        Sort Key: td1.a
        ->  Nested Loop  (cost=0.00..2.03 rows=1 width=16)
              Join Filter: (td1.a = td2.c)
              ->  Seq Scan on td1  (cost=0.00..1.01 rows=1 width=8)
              ->  Seq Scan on td2  (cost=0.00..1.01 rows=1 width=8)
(7 rows)

-- When join field is not a distribution key, data is pulled to CN node for execution, resulting in lower efficiency.
postgres=# EXPLAIN SELECT * FROM td1,td2 WHERE td1.b=td2.b ORDER BY a;
                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------
Remote Subquery Scan on all (dn1,dn2,dn3)  (cost=2.04..2.05 rows=1 width=16)
  ->  Sort  (cost=2.04..2.05 rows=1 width=16)
        Sort Key: td1.a
        ->  Nested Loop  (cost=0.00..2.03 rows=1 width=16)
              Join Filter: (td1.b = td2.b)
              ->  Remote Subquery Scan on all (dn1,dn2,dn3)  (cost=100.00..101.02 rows=1 width=8)
                    Distribute results by H: b
                    ->  Seq Scan on td1  (cost=0.00..1.01 rows=1 width=8)
              ->  Materialize  (cost=100.00..101.03 rows=1 width=8)
                    ->  Remote Subquery Scan on all (dn1,dn2,dn3)  (cost=100.00..101.02 rows=1 width=8)
                          Distribute results by H: b
                          ->  Seq Scan on td2  (cost=0.00..1.01 rows=1 width=8)
(12 rows)

Statistical Information Optimization

GBase 8c generates optimal execution plans based on cost estimation using analyze collected statistics. Statistics are crucial for optimizer estimations and cost calculations. Analyze execution plans to find inefficient operators and optimize or force plan changes. Typically, GBase8c chooses the best plan during execution, with manual adjustments needed only if they improve efficiency.

Operator-Level Optimization

Queries go through multiple operator steps to produce the final result. Performance issues often arise from bottleneck operators. Use EXPLAIN ANALYZE or PERFORMANCE commands to identify bottlenecks and optimize accordingly.

SQL Query Rewriting

Adjust SQL queries following specific rules to improve execution efficiency while ensuring correct results.

Example: Using exists instead of in:

-- Inefficient query using IN
select st.id,st.name from stu st where st.age=5 and st.id not in (
select stu_id from stu_info where code in ('a','b','c','d')
);
test=# explain select st.id,st.name from stu st where st.age=5 and st.id not in (
test(# select stu_id from stu_info where code in ('a','b','c','d')
test(# );
                                           QUERY PLAN
--------------------------------------------------------------------------------------------------
LightProxy  (cost=0.00..0.00 rows=1000 width=222)
  Node/s: All datanodes
  ->  Nested Loop Anti Join  (cost=0.00..2.22 rows=1 width=10)
        Join Filter: ((st.id = stu_info.stu_id) OR (st.id IS NULL) OR (stu_info.stu_id IS NULL))
        ->  Seq Scan on stu st  (cost=0.00..1.06 rows=1 width=10)
              Filter: (age = 5)
        ->  Materialize  (cost=0.00..1.09 rows=4 width=4)
              ->  Seq Scan on stu_info  (cost=0.00..1.07 rows=4 width=4)
                    Filter: ((code)::text = ANY ('{a,b,c,d}'::text[]))
(9 rows)


-- Efficient query using EXISTS
select st.id,st.name from stu st where st.age=5 and not exists (
select 1 from stu_info si where si.stu_id=st.id and si.code in ('a','b','c','d')
);
test=# explain select st.id,st.name from stu st where st.age=5 and not exists (
test(# select 1 from stu_info si where si.stu_id=st.id and si.code in ('a','b','c','d')
test(# );
                              QUERY PLAN
-------------------------------------------------------------------------
LightProxy  (cost=0.00..0.00 rows=1000 width=222)
  Node/s: All datanodes
  ->  Hash Right Anti Join  (cost=1.07..2.15 rows=1 width=10)
        Hash Cond: (si.stu_id = st.id)
        ->  Seq Scan on stu_info si  (cost=0.00..1.07 rows=4 width=4)
              Filter: ((code)::text = ANY ('{a,b,c,d}'::text[]))
        ->  Hash  (cost=1.06..1.06 rows=1 width=10)
              ->  Seq Scan on stu st  (cost=0.00..1.06 rows=1 width=10)
                    Filter: (age = 5)
(9 rows)

Other optimizations include avoiding SELECT *, using LIMIT to restrict rows returned, avoiding functions in WHERE conditions, and combining multiple inserts into batch inserts.

Blog