Updated on 2025-05-29 GMT+08:00

Logical Decoding of DDL Statements

DDL statements can be properly executed on a GaussDB host and can be obtained using a logical decoding tool.

Table 1 Supported DDL statements

Table

Index

User-defined Function

User-defined Stored Procedure

Trigger

Sequence

View

Materialized View

Package

Schema

Comment on

CREATE TABLE [PARTITION | AS | SUBPARTITION | LIKE]

ALTER TABLE [PARTITION | SUBPARTITION]

DROP TABLE

RENAME TABLE

SELECT INTO

TRUNCATE

CREATE INDEX

ALTER INDEX

DROP INDEX

REINDEX

CREATE FUNCTION

ALTER FUNCTION

DROP FUNCTION

CREATE PROCEDURE

ALTER PROCEDURE

DROP PROCEDURE

CREATE TRIGGER

ALTER TRIGGER

DROP TRIGGER

CREATE SEQUENCE

ALTER SEQUENCE

DROP SEQUENCE

CREATE VIEW

ALTER VIEW

DROP VIEW

CREATE [INCREMENTAL] MATERIALIZED VIEW

ALTER MATERIALIZED VIEW

DROP MATERIALIZED VIEW

REFRESH [INCREMENTAL] MATERIALIZED VIEW

CREATE PACKAGE

ALTER PACKAGE

DROP PACKAGE

CREATE SCHEMA

ALTER SCHEMA

DROP SCHEMA

COMMENT ON

Table 2 DDL statements supported by M-compatible databases

Table

Index

Sequence

View

Schema

Comment on

ALTER TABLE

CREATE TABLE

DROP TABLE

ALTER TABLE PARTITION

CREATE TABLE PARTITION

ALTER TABLE SUBPARTITION

CREATE TABLE SUBPARTITION

TRUNCATE

ALTER TABLE TRUNCATE

ALTER INDEX

CREATE INDEX

DROP INDEX

REINDEX

ALTER TABLE DROP INDEX

ALTER TABLE ADD INDEX

ALTER SEQUENCE

CREATE SEQUENCE

DROP SEQUENCE

ALTER VIEW

CREATE VIEW

DROP VIEW

CREATE VIEW using the WITH option

ALTER VIEW using the WITH option

ALTER SCHEMA

CREATE SCHEMA

DROP SCHEMA

ALTER DATABASE

CREATE DATABASE

DROP DATABASE

COMMENT ON

Description

When DML statements are executed in the database, the storage engine generates DML logs for restoration. After decoding the DML logs, the storage engine restores the corresponding DML statements and generates logical logs. For DDL statements, the database does not record logs of original DDL statements. Instead, it records DML logs of system catalogs involved in DDL statements. DDL statements are of various types and complex syntax. It is difficult to support DDL statements in logical replication and decode original DDL statements based on DML logs of these system catalogs. DDL logs are added to record original DDL information. During decoding, original DDL statements can be obtained from DDL logs.

During the execution of a DDL statement, the SQL engine parser parses the syntax and lexicon of the original statement and generates a parsing tree. (Different DDL syntaxes generate different types of parsing trees, which contain all information about the DDL statement.) Then, the executor performs the corresponding operation based on the information to generate and modify the corresponding meta information.

In this section, DDL logs are added to support logical decoding of DDL statements. The content of DDL logs is generated based on the parser result (parsing tree) and executor result, and the logs are generated after the execution is complete.

DDL statements are reversely parsed from the syntax tree. In this way, DDL commands are converted to JSON statements and necessary information is provided to rebuild DDL commands in the target location. Compared with the original DDL command strings, the benefits of parsing DDL statements from the syntax tree include:

  1. Each parsed database object has a schema. Therefore, if different search paths are used, there is no ambiguity.
  2. Structured JSON statements and formatted output support heterogeneous databases. If you are using different database versions and some DDL syntaxes differ, you need to resolve these differences before applying them.

The output result of reverse parsing is in the normalized format. This result is equivalent to the user input but is not necessarily the same. For example:

Example 1: If the function body does not contain single quotation marks ('), the separator $$ of the function body is parsed as single quotation marks (').

Original SQL statement:

CREATE FUNCTION func(a INT) RETURNS INT AS
$$
BEGIN
a:= a+1;
CREATE TABLE test(col1 INT);
INSERT INTO test VALUES(1);
DROP TABLE test;
RETURN a;
END;
$$
LANGUAGE plpgsql;

Reverse parsing result:

CREATE  FUNCTION public.func ( IN a pg_catalog.int4 ) RETURNS pg_catalog.int4 LANGUAGE plpgsql VOLATILE CALLED ON NULL INPUT SECURITY INVOKER COST 100 AS '
BEGIN
a:= a+1;
CREATE TABLE test(col1 INT);
INSERT INTO test VALUES(1);
DROP TABLE test;
RETURN a;
END;
';

Example 2: "CREATE MATERIALIZED VIEW v46_4 AS SELECT a, b FROM t46 ORDER BY a OFFSET 10 ROWS FETCH NEXT 3 ROWS ONLY" will be reversely parsed as "CREATE MATERIALIZED VIEW public.v46_4 AS SELECT a, b FROM public.t46 ORDER BY a OFFSET 10 LIMIT 3."

Example 3: "ALTER INDEX "Alter_Index_Index" REBUILD PARTITION "CA_ADDRESS_SK_index2"" will be reversely parsed as "REINDEX INDEX public."Alter_Index_Index" PARTITION "CA_ADDRESS_SK_index2"."

Example 4: Create or modify a range partitioned table. The START END syntax is decoded and converted into the LESS THAN statement.
gaussdb=# CREATE TABLE test_create_table_partition2 (c1 INT, c2 INT) 
PARTITION BY RANGE (c2) (
    PARTITION p1 START(1) END(1000) EVERY(200) ,
    PARTITION p2 END(2000),
    PARTITION p3 START(2000) END(2500),
    PARTITION p4 START(2500),
    PARTITION p5 START(3000) END(5000) EVERY(1000) 
);

Reverse parsing result:

gaussdb=# CREATE TABLE test_create_table_partition2 (c1 INT, c2 INT) 
PARTITION BY RANGE (c2) (
    PARTITION p1_0 VALUES LESS THAN ('1'), PARTITION p1_1 VALUES LESS THAN ('201'), PARTITION p1_2 VALUES LESS THAN ('401'), PARTITION p1_3 VALUES LESS THAN ('601'), PARTITION p1_4 VALUES LESS THAN ('801'), PARTITION p1_5 VALUES LESS    THAN ('1000'), 
    PARTITION p2 VALUES LESS THAN ('2000'), 
    PARTITION p3 VALUES LESS THAN ('2500'), 
    PARTITION p4 VALUES LESS THAN ('3000'), 
    PARTITION p5_1 VALUES LESS THAN ('4000'), 
    PARTITION p5_2 VALUES LESS THAN ('5000')
);
Example 5: When adding a column to a table, use IF NOT EXISTS for judgment.
  • Original SQL statement:
    gaussdb=# CREATE TABLE IF NOT EXISTS tb5 (c1 int,c2 int) with (ORIENTATION=ROW, STORAGE_TYPE=USTORE);
    gaussdb=# ALTER TABLE IF EXISTS tb5 * ADD COLUMN IF NOT EXISTS c2 char(5) after c1; -- Can be decoded. Column c2 of the int type exists in the table and is skipped, and the type of column c2 in the reverse parsing result remains unchanged.

    Reverse parsing result:

    gaussdb=# ALTER TABLE IF EXISTS public.tb5 ADD COLUMN IF NOT EXISTS c2 pg_catalog.int4 AFTER c1;
  • Original SQL statement:
    gaussdb=# ALTER TABLE IF EXISTS tb5 * ADD COLUMN IF NOT EXISTS c2 char(5) after c1,  ADD COLUMN IF NOT EXISTS c3 char(5) after c1; -- Decoded. The type of the new column c3 in the reverse parsing result is correct.

    Reverse parsing result:

    gaussdb=# ALTER TABLE IF EXISTS public.tb5 ADD COLUMN IF NOT EXISTS c2 pg_catalog.int4 AFTER c1, ADD COLUMN IF NOT EXISTS c3 pg_catalog.bpchar(5) AFTER c1;
  • Original SQL statement:
    gaussdb=# ALTER TABLE IF EXISTS tb5 * ADD COLUMN c2 char(5) after c1,  ADD COLUMN IF NOT EXISTS c4 int after c1; -- Not decoded. An error occurs when the statement is executed.

Limitations

  • DDL specification constraints for logical decoding:
    • The logical decoding performance in the DDL-only scenario is about 100 MB/s in the standard environment, and that in DDL/DML hybrid transaction scenario is about 100 MB/s in the standard environment.
    • After this function is enabled (by setting wal_level to logical and enable_logical_replication_ddl to on), the performance of DDL statements decreases by less than 15%.
  • General decoding constraints (serial and parallel):
    • DDL operations on local temporary objects cannot be decoded.
    • DDL statement decoding in the FOREIGN TABLE scenario is not supported.
    • The DEFAULT of ALTER TABLE ADD COLUMN does not support stable or volatile functions. The CHECK constraint expression of CREATE TABLE and ALTER TABLE regarding columns does not support stable or volatile functions. If ALTER TABLE has multiple clauses and one of them has the preceding two situations, the entire ALTER TABLE statement is not parsed reversely.

      gaussdb=# ALTER TABLE tbl_28 ADD COLUMN b1 TIMESTAMP DEFAULT NOW(); -- 's' NOT DEPARSE
      gaussdb=# ALTER TABLE tbl_28 ADD COLUMN b2 INT DEFAULT RANDOM(); -- 'v' NOT DEPARSE
      gaussdb=# ALTER TABLE tbl_28 ADD COLUMN b3 INT DEFAULT ABS(1); -- 'i' DEPARSE
    • If IF NOT EXISTS exists in the statement for creating an object and the object already exists, the statement is not decoded. If IF EXISTS exists in the statement for deleting an object but the object does not exist, the statement is not decoded.
    • The ALTER PACKAGE COMPILE statement is not decoded, but the DDL/DML statements contained in the instantiated content are decoded. If the package does not contain instantiated content involving DDL or DML statements, ALTER PACKAGE COMPILE will be ignored by logical decoding.
    • Only the commercial DDL syntax earlier than this version is supported. The following SQL statements do not support logical decoding:
      • Create a row-store table and set the ILM policy.

        Original SQL statement:

        gaussdb=# CREATE TABLE IF NOT EXISTS tb3 (c1 int) with (storage_type=USTORE,ORIENTATION=ROW) ILM ADD POLICY ROW STORE COMPRESS ADVANCED ROW AFTER 7 day OF NO MODIFICATION;

        Reverse parsing result:

        gaussdb=# CREATE TABLE IF NOT EXISTS public.tb3 (c1 pg_catalog.int4) WITH (storage_type = 'ustore', orientation = 'row', compression = 'no') NOCOMPRESS;
      • When creating a table, add the IDENTITY constraint to a column.
        CREATE TABLE IF NOT EXISTS tb4 (c1 int GENERATED ALWAYS AS IDENTITY (INCREMENT BY 2 MINVALUE 10 MAXVALUE 20 CYCLE SCALE)); 
    • Logical decoding does not support DDL/DCL/DML hybrid transactions. In hybrid transactions, DML statements after DDL statements cannot be decoded.
      -- No reverse parsing is performed. DCL statements are not supported and therefore are not parsed. DML statements after DCL statements are not parsed.
      gaussdb=# BEGIN;
      gaussdb=# GAINT ALL PRIVILEGES to u01;
      gaussdb=# INSERT INTO test1(col1) values(1);
      gaussdb=# COMMIT; 
      
      -- Only the first and third SQL statements are reversely parsed.
      gaussdb=# BEGIN;
      gaussdb=# CREATE TABLE mix_tran_t4(id int);
      gaussdb=# INSERT INTO mix_tran_t4 VALUES(111);
      gaussdb=# CREATE TABLE mix_tran_t5(id int);
      gaussdb=# COMMIT; 
      
      -- Only the first and second SQL statements are reversely parsed.
      gaussdb=# BEGIN;
      gaussdb=# INSERT INTO mix_tran_t4 VALUES(111);
      gaussdb=# CREATE TABLE mix_tran_t6(id int);
      gaussdb=# INSERT INTO mix_tran_t4 VALUES(111);
      gaussdb=# COMMIT; 
      
      -- Full reverse parsing
      gaussdb=# BEGIN;
      gaussdb=# INSERT INTO mix_tran_t4 VALUES(111);
      gaussdb=# CREATE TABLE mix_tran_t7(id int);
      gaussdb=# CREATE TABLE mix_tran_t8(id int);
      gaussdb=# COMMIT; 
      
      -- Only the first and third SQL statements are reversely parsed.
      gaussdb=# BEGIN;
      gaussdb=# CREATE TABLE mix_tran_t7(id int);
      gaussdb=# CREATE TYPE compfoo AS (f1 int, f2 text);
      gaussdb=# CREATE TABLE mix_tran_t8(id int);
      gaussdb=# COMMIT; 
      
      -- Full reverse parsing
      gaussdb=# BEGIN;
      gaussdb=# INSERT INTO mix_tran_t4 VALUES(111);
      gaussdb=# INSERT INTO mix_tran_t4 VALUES(111);
      gaussdb=# INSERT INTO mix_tran_t4 VALUES(111);
      gaussdb=# COMMIT; 
      
      -- Only the first SQL statement is reversely parsed.
      gaussdb=# BEGIN;
      gaussdb=# INSERT INTO mix_tran_t4 VALUES(111);
      gaussdb=# CREATE TYPE compfoo AS (f1 int, f2 text);
      gaussdb=# INSERT INTO mix_tran_t4 VALUES(111);
      gaussdb=# COMMIT; 
      
      -- Only the first and third SQL statements are reversely parsed.
      gaussdb=# BEGIN;
      gaussdb=# INSERT INTO mix_tran_t4 VALUES(111);
      gaussdb=# CREATE TYPE compfoo AS (f1 int, f2 text);
      gaussdb=# CREATE TABLE mix_tran_t9(id int);
      gaussdb=# COMMIT; 
    • For CREATE TABLE AS SELECT, SELECT INTO, and CREATE TABLE AS statements, only CREATE TABLE statements can be decoded and INSERT statements cannot be decoded.

      For tables created by using CREATE TABLE AS, the ALTER and DROP statements are still decoded.

      Example:

      Original SQL statement:

      CREATE TABLE IF NOT EXISTS tb35_2 (c1 int) with (storage_type=USTORE,ORIENTATION=ROW);
      INSERT INTO tb35_2 VALUES (6);
      CREATE TABLE tb35_1 with (storage_type=USTORE,ORIENTATION=ROW) AS SELECT * FROM tb35_2;

      Reverse parsing result of the last SQL statement:

       CREATE  TABLE  public.tb35_1 (c1 pg_catalog.int4) WITH (storage_type = 'ustore', orientation = 'row', compression = 'no') NOCOMPRESS;
    • When a stored procedure, function, or advanced package is executed, if the stored procedure, function, or advanced package contains a DDL/DML hybrid transaction or the stored procedure, function, or advanced package and other statements in the same transaction form a DDL/DML hybrid transaction, decoding is performed based on the hybrid transaction principle.
    • Logical decoding does not support the ledger database feature because the decoding result of the DDL statement for creating a ledger database contains hash columns.
      • Original statement:
        CREATE SCHEMA blockchain_schema WITH BLOCKCHAIN;
        CREATE TABLE blockchain_schema.blockchain_table(mes int);
      • Decoding result:
        CREATE SCHEMA blockchain_schema WITH BLOCKCHAIN;
        CREATE TABLE blockchain_schema.blockchain_table (mes pg_catalog.int4, hash_a1d895 pg_catalog.hash16);  -- The statement cannot be replayed on the target end. You need to manually disable the tamper-proof attribute of blockchain_schema on the target end before replaying the statement. In this case, blockchain_table on the target end is equivalent to an ordinary table, and DML commands executed later can be replayed.

        SQL commands:

        ALTER SCHEMA blockchain_schema WITHOUT BLOCKCHAIN;
        CREATE TABLE blockchain_schema.blockchain_table (mes pg_catalog.int4, hash_a1d895 pg_catalog.hash16);
  • In B-compatible mode, the ALTER SCHEMA schema_name WITHOUT/WITH BLOCKCHAIN syntax cannot be parsed.
  • DDL-specific constraints for serial logical decoding:
    • The sql_decoding plug-in does not support DDL statements in JSON format.

Decoding Format

  • JSON format

    When a DDL statement is input, the SQL engine parser parses the syntax and lexicon of the DDL statement and generates a parsing tree. The parsing tree contains all DDL information and the executor modifies system metadata based on the parsing tree content. After the execution is complete, you can obtain the search path of the DDL objects. After the executor is successfully executed, this feature reversely parses the parsing tree information and executor result to restore all information about the original DDL statement. In this way, the entire DDL statement can be parsed and a DDL statement in JSON format is output to adapt to heterogeneous databases.

    Upon lexical and syntax analysis on the CREATE TABLE statement, the corresponding CreateStmt parsing tree node is obtained, containing table information, column information, distribution information (DistributeBy structure), and partition information (PartitionState structure). After reverse parsing, the result is output in JSON format as follows:

    {"JDDL":{"fmt":"CREATE %{persistence}s TABLE %{if_not_exists}s %{identity}D %{table_elements}s %{with_clause}s %{compression}s","identity":{"object_name":"test_create_table_a","schema_name":"public"},"compression":"NOCOMPRESS","persistence":"","with_clause":{"fmt":"WITH (%{with:, }s)","with":[{"fmt":"%{label}s = %{value}L","label":{"fmt":"%{label}I","label":"orientation"},"value":"row"},{"fmt":"%{label}s = %{value}L","label":{"fmt":"%{label}I","label":"compression"},"value":"no"}]},"if_not_exists":"","table_elements":{"fmt":"(%{elements:, }s)","elements":[{"fmt":"%{name}I %{column_type}T","name":"a","column_type":{"typmod":"","typarray":false,"type_name":"int4","schema_name":"pg_catalog"}}]}}}

    The output JSON string contains the search path of the object. In the string, the identity key indicates that the schema is public and the table name is test_create_table_a. %{persistence}s corresponds to the following field in the SQL statement (This SQL statement does not contain this field and therefore is empty.):

    [ [ GLOBAL | LOCAL ] [ TEMPORARY | TEMP ] | UNLOGGED ]

    %{if_not_exists}s corresponds to a field in the SQL statement. (This SQL statement does not contain this field and therefore the field is empty.)

    [ IF NOT EXISTS ]

    %{identity}D corresponds to the following field in the SQL statement:

    table_name  

    %{table_elements}s corresponds to the following field in the SQL statement:

     (column_name data_type)

    %{with_clause}s corresponds to the following field in the SQL statement:

    [ WITH ( {storage_parameter = value} [, ... ] ) ]

    %{compression}s corresponds to the following field in the SQL statement:

    [ COMPRESS | NOCOMPRESS ]
  • Format specified by decode-style

    The output format is controlled by the decode-style parameter. For example, when decode-style is set to 'j', the output format is as follows:

    {"TDDL":"CREATE  TABLE  public.test_create_table_a (a pg_catalog.int4) WITH (orientation = 'row', compression = 'no') NOCOMPRESS"}

    The statement also contains the schema name.

API Design

  • New control parameters
    1. Logical decoding control parameters are added to control the DDL statement reverse parsing process and output format. To enable them, you can use the JDBC or pg_logical_slot_peek_changes API.
      • enable-ddl-decoding: The default value is false, indicating that logical decoding of DDL statements is disabled. The value true indicates that logical decoding of DDL statements is enabled.
      • enable-ddl-json-format: The default value is false, indicating that the DDL statement reverse parsing result is output in text format. The value true indicates that the DDL statement reverse parsing result is output in JSON format.
    2. A GUC parameter is added.
      • enable_logical_replication_ddl: The default value is ON. If the value is ON, logical replication of DDL statements is supported. Otherwise, logical replication of DDL statements is not supported. The DDL statement execution result is reversely parsed and WALs of the DDL statements are generated only when this parameter is set to ON. Otherwise, no reverse parsing is performed and no WAL is generated.

        You can check the operation logs of enable_logical_replication_ddl and determine whether logical decoding of DDL statements is not supported because the parameter is modified by users.

  • New logs

    The xl_logical_ddl_message log is added for DDL statements. The log type is RM_LOGICALDDLMSG_ID. The definition is as follows:

    Name

    Type

    Description

    db_id

    OID

    Database ID

    rel_id

    OID

    Table ID

    csn

    CommitSeqNo

    CSN-based snapshot

    cid

    CommandId

    Command ID

    tag_type

    NodeTag

    DDL type

    message_size

    Size

    Length of the log content

    filter_message_size

    Size

    Length of log information filtered by whitelist

    message

    char *

    DDL content

Procedure

  1. For the logical decoding feature, set the wal_level GUC parameter to logical in advance. This parameter takes effect only after the system is restarted.

    gs_guc set -Z datanode -D $node_dir -c "wal_level = logical"

    In the preceding command, $node_dir indicates the database node path. Change it based on the actual situation.

  2. Log in to the primary node of GaussDB as a user with the REPLICATION permission and run the following command to connect to the database:

    gsql -U user1 -W password -d db1 -p 16000 -r

    In the preceding command, user1 indicates the username, password indicates the user password, db1 indicates the name of the database to be connected, and 16000 indicates the database port number. You can replace them as required.

  3. Create a logical replication slot named slot1.

    1
    2
    3
    4
    5
    db1=>SELECT * FROM pg_create_logical_replication_slot('slot1', 'mppdb_decoding');
     slotname | xlog_position 
    ----------+---------------
     slot1    | 0/3764C788
    (1 row)
    

  4. Create a package in the database.

    1
    2
    3
    4
    5
    6
    db1=> CREATE OR REPLACE PACKAGE ldp_pkg1 IS
        var1 int:=1;  -- Public variable
        var2 int:=2;
        PROCEDURE testpro1(var3 int);  -- Public stored procedure, which can be called by external systems.
    END ldp_pkg1;
    /
    

  5. Read the decoding result of replication slot 1. You can the JDBC or pg_logical_slot_peek_changes API to update the replication slot number.

    • For details about the logical decoding options, see Logical Decoding Options and new control parameters.
    • In parallel decoding, you can change the value of the decode_style parameter in the JDBC API to determine the decoding format.
      • Configure decode-style to specify the decoding format. The value can be 'j', 't', or 'b' of the char type, indicating the JSON, TEXT, or binary format, respectively.
    1
    2
    3
    4
    5
    6
    7
    db1=> SELECT data FROM pg_logical_slot_peek_changes('slot1', NULL, NULL, 'enable-ddl-decoding', 'true', 'enable-ddl-json-format', 'false') WHERE data not like 'BEGIN%' AND data not like 'COMMIT%' AND data not like '%dbe_pldeveloper.gs_source%';
    
                                                                                                          data                  
    
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- {"TDDL":"CREATE OR REPLACE PACKAGE public.ldp_pkg1 AUTHID CURRENT_USER IS var1 int:=1; -- Public variable\n    var2 int:=2;\n    
    PROCEDURE testpro1(var3 int);  -- Public stored procedure, which can be called externally.\nEND  ldp_pkg1; \n /"}
    (1 row)
    

  6. Delete the logical replication slot slot1 and package ldp_pkg1.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    db1=> SELECT * FROM pg_drop_replication_slot('slot1');
     pg_drop_replication_slot
    --------------------------
    
    (1 row)
    
    gaussdb=# DROP PACKAGE ldp_pkg1;
    NOTICE:  drop cascades to function public.testpro1(integer)
    DROP PACKAGE