Logical Decoding of DDL Statements
DDL statements can be properly executed on a GaussDB host and can be obtained using a logical decoding tool.
Table |
Index |
User-defined Function |
User-defined Stored Procedure |
Trigger |
Sequence |
View |
Materialized View |
Package |
Schema |
Comment on |
---|---|---|---|---|---|---|---|---|---|---|
CREATE TABLE [PARTITION | AS | SUBPARTITION | LIKE] ALTER TABLE [PARTITION | SUBPARTITION] DROP TABLE RENAME TABLE SELECT INTO TRUNCATE |
CREATE INDEX ALTER INDEX DROP INDEX REINDEX |
CREATE FUNCTION ALTER FUNCTION DROP FUNCTION |
CREATE PROCEDURE ALTER PROCEDURE DROP PROCEDURE |
CREATE TRIGGER ALTER TRIGGER DROP TRIGGER |
CREATE SEQUENCE ALTER SEQUENCE DROP SEQUENCE |
CREATE VIEW ALTER VIEW DROP VIEW |
CREATE [INCREMENTAL] MATERIALIZED VIEW ALTER MATERIALIZED VIEW DROP MATERIALIZED VIEW REFRESH [INCREMENTAL] MATERIALIZED VIEW |
CREATE PACKAGE ALTER PACKAGE DROP PACKAGE |
CREATE SCHEMA ALTER SCHEMA DROP SCHEMA |
COMMENT ON |
Table |
Index |
Sequence |
View |
Schema |
Comment on |
---|---|---|---|---|---|
ALTER TABLE CREATE TABLE DROP TABLE ALTER TABLE PARTITION CREATE TABLE PARTITION ALTER TABLE SUBPARTITION CREATE TABLE SUBPARTITION TRUNCATE ALTER TABLE TRUNCATE |
ALTER INDEX CREATE INDEX DROP INDEX REINDEX ALTER TABLE DROP INDEX ALTER TABLE ADD INDEX |
ALTER SEQUENCE CREATE SEQUENCE DROP SEQUENCE |
ALTER VIEW CREATE VIEW DROP VIEW CREATE VIEW using the WITH option ALTER VIEW using the WITH option |
ALTER SCHEMA CREATE SCHEMA DROP SCHEMA ALTER DATABASE CREATE DATABASE DROP DATABASE |
COMMENT ON |
Description
When DML statements are executed in the database, the storage engine generates DML logs for restoration. After decoding the DML logs, the storage engine restores the corresponding DML statements and generates logical logs. For DDL statements, the database does not record logs of original DDL statements. Instead, it records DML logs of system catalogs involved in DDL statements. DDL statements are of various types and complex syntax. It is difficult to support DDL statements in logical replication and decode original DDL statements based on DML logs of these system catalogs. DDL logs are added to record original DDL information. During decoding, original DDL statements can be obtained from DDL logs.
During the execution of a DDL statement, the SQL engine parser parses the syntax and lexicon of the original statement and generates a parsing tree. (Different DDL syntaxes generate different types of parsing trees, which contain all information about the DDL statement.) Then, the executor performs the corresponding operation based on the information to generate and modify the corresponding meta information.
In this section, DDL logs are added to support logical decoding of DDL statements. The content of DDL logs is generated based on the parser result (parsing tree) and executor result, and the logs are generated after the execution is complete.
DDL statements are reversely parsed from the syntax tree. In this way, DDL commands are converted to JSON statements and necessary information is provided to rebuild DDL commands in the target location. Compared with the original DDL command strings, the benefits of parsing DDL statements from the syntax tree include:
- Each parsed database object has a schema. Therefore, if different search paths are used, there is no ambiguity.
- Structured JSON statements and formatted output support heterogeneous databases. If you are using different database versions and some DDL syntaxes differ, you need to resolve these differences before applying them.
The output result of reverse parsing is in the normalized format. This result is equivalent to the user input but is not necessarily the same. For example:
Example 1: If the function body does not contain single quotation marks ('), the separator $$ of the function body is parsed as single quotation marks (').
Original SQL statement:
CREATE FUNCTION func(a INT) RETURNS INT AS $$ BEGIN a:= a+1; CREATE TABLE test(col1 INT); INSERT INTO test VALUES(1); DROP TABLE test; RETURN a; END; $$ LANGUAGE plpgsql;
Reverse parsing result:
CREATE FUNCTION public.func ( IN a pg_catalog.int4 ) RETURNS pg_catalog.int4 LANGUAGE plpgsql VOLATILE CALLED ON NULL INPUT SECURITY INVOKER COST 100 AS ' BEGIN a:= a+1; CREATE TABLE test(col1 INT); INSERT INTO test VALUES(1); DROP TABLE test; RETURN a; END; ';
Example 2: "CREATE MATERIALIZED VIEW v46_4 AS SELECT a, b FROM t46 ORDER BY a OFFSET 10 ROWS FETCH NEXT 3 ROWS ONLY" will be reversely parsed as "CREATE MATERIALIZED VIEW public.v46_4 AS SELECT a, b FROM public.t46 ORDER BY a OFFSET 10 LIMIT 3."
Example 3: "ALTER INDEX "Alter_Index_Index" REBUILD PARTITION "CA_ADDRESS_SK_index2"" will be reversely parsed as "REINDEX INDEX public."Alter_Index_Index" PARTITION "CA_ADDRESS_SK_index2"."
gaussdb=# CREATE TABLE test_create_table_partition2 (c1 INT, c2 INT)
PARTITION BY RANGE (c2) (
PARTITION p1 START(1) END(1000) EVERY(200) ,
PARTITION p2 END(2000),
PARTITION p3 START(2000) END(2500),
PARTITION p4 START(2500),
PARTITION p5 START(3000) END(5000) EVERY(1000)
);
Reverse parsing result:
gaussdb=# CREATE TABLE test_create_table_partition2 (c1 INT, c2 INT)
PARTITION BY RANGE (c2) (
PARTITION p1_0 VALUES LESS THAN ('1'), PARTITION p1_1 VALUES LESS THAN ('201'), PARTITION p1_2 VALUES LESS THAN ('401'), PARTITION p1_3 VALUES LESS THAN ('601'), PARTITION p1_4 VALUES LESS THAN ('801'), PARTITION p1_5 VALUES LESS THAN ('1000'),
PARTITION p2 VALUES LESS THAN ('2000'),
PARTITION p3 VALUES LESS THAN ('2500'),
PARTITION p4 VALUES LESS THAN ('3000'),
PARTITION p5_1 VALUES LESS THAN ('4000'),
PARTITION p5_2 VALUES LESS THAN ('5000')
);
- Original SQL statement:
gaussdb=# CREATE TABLE IF NOT EXISTS tb5 (c1 int,c2 int) with (ORIENTATION=ROW, STORAGE_TYPE=USTORE); gaussdb=# ALTER TABLE IF EXISTS tb5 * ADD COLUMN IF NOT EXISTS c2 char(5) after c1; -- Can be decoded. Column c2 of the int type exists in the table and is skipped, and the type of column c2 in the reverse parsing result remains unchanged.
Reverse parsing result:
gaussdb=# ALTER TABLE IF EXISTS public.tb5 ADD COLUMN IF NOT EXISTS c2 pg_catalog.int4 AFTER c1;
- Original SQL statement:
gaussdb=# ALTER TABLE IF EXISTS tb5 * ADD COLUMN IF NOT EXISTS c2 char(5) after c1, ADD COLUMN IF NOT EXISTS c3 char(5) after c1; -- Decoded. The type of the new column c3 in the reverse parsing result is correct.
Reverse parsing result:
gaussdb=# ALTER TABLE IF EXISTS public.tb5 ADD COLUMN IF NOT EXISTS c2 pg_catalog.int4 AFTER c1, ADD COLUMN IF NOT EXISTS c3 pg_catalog.bpchar(5) AFTER c1;
- Original SQL statement:
gaussdb=# ALTER TABLE IF EXISTS tb5 * ADD COLUMN c2 char(5) after c1, ADD COLUMN IF NOT EXISTS c4 int after c1; -- Not decoded. An error occurs when the statement is executed.
Limitations
- DDL specification constraints for logical decoding:
- The logical decoding performance in the DDL-only scenario is about 100 MB/s in the standard environment, and that in DDL/DML hybrid transaction scenario is about 100 MB/s in the standard environment.
- After this function is enabled (by setting wal_level to logical and enable_logical_replication_ddl to on), the performance of DDL statements decreases by less than 15%.
- General decoding constraints (serial and parallel):
- DDL operations on local temporary objects cannot be decoded.
- DDL statement decoding in the FOREIGN TABLE scenario is not supported.
-
The DEFAULT of ALTER TABLE ADD COLUMN does not support stable or volatile functions. The CHECK constraint expression of CREATE TABLE and ALTER TABLE regarding columns does not support stable or volatile functions. If ALTER TABLE has multiple clauses and one of them has the preceding two situations, the entire ALTER TABLE statement is not parsed reversely.
gaussdb=# ALTER TABLE tbl_28 ADD COLUMN b1 TIMESTAMP DEFAULT NOW(); -- 's' NOT DEPARSE gaussdb=# ALTER TABLE tbl_28 ADD COLUMN b2 INT DEFAULT RANDOM(); -- 'v' NOT DEPARSE gaussdb=# ALTER TABLE tbl_28 ADD COLUMN b3 INT DEFAULT ABS(1); -- 'i' DEPARSE
- If IF NOT EXISTS exists in the statement for creating an object and the object already exists, the statement is not decoded. If IF EXISTS exists in the statement for deleting an object but the object does not exist, the statement is not decoded.
- The ALTER PACKAGE COMPILE statement is not decoded, but the DDL/DML statements contained in the instantiated content are decoded. If the package does not contain instantiated content involving DDL or DML statements, ALTER PACKAGE COMPILE will be ignored by logical decoding.
- Only the commercial DDL syntax earlier than this version is supported. The following SQL statements do not support logical decoding:
- Create a row-store table and set the ILM policy.
gaussdb=# CREATE TABLE IF NOT EXISTS tb3 (c1 int) with (storage_type=USTORE,ORIENTATION=ROW) ILM ADD POLICY ROW STORE COMPRESS ADVANCED ROW AFTER 7 day OF NO MODIFICATION;
Reverse parsing result:
gaussdb=# CREATE TABLE IF NOT EXISTS public.tb3 (c1 pg_catalog.int4) WITH (storage_type = 'ustore', orientation = 'row', compression = 'no') NOCOMPRESS;
- When creating a table, add the IDENTITY constraint to a column.
CREATE TABLE IF NOT EXISTS tb4 (c1 int GENERATED ALWAYS AS IDENTITY (INCREMENT BY 2 MINVALUE 10 MAXVALUE 20 CYCLE SCALE));
- Create a row-store table and set the ILM policy.
- Logical decoding does not support DDL/DCL/DML hybrid transactions. In hybrid transactions, DML statements after DDL statements cannot be decoded.
-- No reverse parsing is performed. DCL statements are not supported and therefore are not parsed. DML statements after DCL statements are not parsed. gaussdb=# BEGIN; gaussdb=# GAINT ALL PRIVILEGES to u01; gaussdb=# INSERT INTO test1(col1) values(1); gaussdb=# COMMIT; -- Only the first and third SQL statements are reversely parsed. gaussdb=# BEGIN; gaussdb=# CREATE TABLE mix_tran_t4(id int); gaussdb=# INSERT INTO mix_tran_t4 VALUES(111); gaussdb=# CREATE TABLE mix_tran_t5(id int); gaussdb=# COMMIT; -- Only the first and second SQL statements are reversely parsed. gaussdb=# BEGIN; gaussdb=# INSERT INTO mix_tran_t4 VALUES(111); gaussdb=# CREATE TABLE mix_tran_t6(id int); gaussdb=# INSERT INTO mix_tran_t4 VALUES(111); gaussdb=# COMMIT; -- Full reverse parsing gaussdb=# BEGIN; gaussdb=# INSERT INTO mix_tran_t4 VALUES(111); gaussdb=# CREATE TABLE mix_tran_t7(id int); gaussdb=# CREATE TABLE mix_tran_t8(id int); gaussdb=# COMMIT; -- Only the first and third SQL statements are reversely parsed. gaussdb=# BEGIN; gaussdb=# CREATE TABLE mix_tran_t7(id int); gaussdb=# CREATE TYPE compfoo AS (f1 int, f2 text); gaussdb=# CREATE TABLE mix_tran_t8(id int); gaussdb=# COMMIT; -- Full reverse parsing gaussdb=# BEGIN; gaussdb=# INSERT INTO mix_tran_t4 VALUES(111); gaussdb=# INSERT INTO mix_tran_t4 VALUES(111); gaussdb=# INSERT INTO mix_tran_t4 VALUES(111); gaussdb=# COMMIT; -- Only the first SQL statement is reversely parsed. gaussdb=# BEGIN; gaussdb=# INSERT INTO mix_tran_t4 VALUES(111); gaussdb=# CREATE TYPE compfoo AS (f1 int, f2 text); gaussdb=# INSERT INTO mix_tran_t4 VALUES(111); gaussdb=# COMMIT; -- Only the first and third SQL statements are reversely parsed. gaussdb=# BEGIN; gaussdb=# INSERT INTO mix_tran_t4 VALUES(111); gaussdb=# CREATE TYPE compfoo AS (f1 int, f2 text); gaussdb=# CREATE TABLE mix_tran_t9(id int); gaussdb=# COMMIT;
- For CREATE TABLE AS SELECT, SELECT INTO, and CREATE TABLE AS statements, only CREATE TABLE statements can be decoded and INSERT statements cannot be decoded.
For tables created by using CREATE TABLE AS, the ALTER and DROP statements are still decoded.
Example:
Original SQL statement:
CREATE TABLE IF NOT EXISTS tb35_2 (c1 int) with (storage_type=USTORE,ORIENTATION=ROW); INSERT INTO tb35_2 VALUES (6); CREATE TABLE tb35_1 with (storage_type=USTORE,ORIENTATION=ROW) AS SELECT * FROM tb35_2;
Reverse parsing result of the last SQL statement:
CREATE TABLE public.tb35_1 (c1 pg_catalog.int4) WITH (storage_type = 'ustore', orientation = 'row', compression = 'no') NOCOMPRESS;
- When a stored procedure, function, or advanced package is executed, if the stored procedure, function, or advanced package contains a DDL/DML hybrid transaction or the stored procedure, function, or advanced package and other statements in the same transaction form a DDL/DML hybrid transaction, decoding is performed based on the hybrid transaction principle.
- Logical decoding does not support the ledger database feature because the decoding result of the DDL statement for creating a ledger database contains hash columns.
- Original statement:
CREATE SCHEMA blockchain_schema WITH BLOCKCHAIN; CREATE TABLE blockchain_schema.blockchain_table(mes int);
- Decoding result:
CREATE SCHEMA blockchain_schema WITH BLOCKCHAIN; CREATE TABLE blockchain_schema.blockchain_table (mes pg_catalog.int4, hash_a1d895 pg_catalog.hash16); -- The statement cannot be replayed on the target end. You need to manually disable the tamper-proof attribute of blockchain_schema on the target end before replaying the statement. In this case, blockchain_table on the target end is equivalent to an ordinary table, and DML commands executed later can be replayed.
SQL commands:
ALTER SCHEMA blockchain_schema WITHOUT BLOCKCHAIN; CREATE TABLE blockchain_schema.blockchain_table (mes pg_catalog.int4, hash_a1d895 pg_catalog.hash16);
- Original statement:
- In B-compatible mode, the ALTER SCHEMA schema_name WITHOUT/WITH BLOCKCHAIN syntax cannot be parsed.
- DDL-specific constraints for serial logical decoding:
- The sql_decoding plug-in does not support DDL statements in JSON format.
Decoding Format
- JSON format
When a DDL statement is input, the SQL engine parser parses the syntax and lexicon of the DDL statement and generates a parsing tree. The parsing tree contains all DDL information and the executor modifies system metadata based on the parsing tree content. After the execution is complete, you can obtain the search path of the DDL objects. After the executor is successfully executed, this feature reversely parses the parsing tree information and executor result to restore all information about the original DDL statement. In this way, the entire DDL statement can be parsed and a DDL statement in JSON format is output to adapt to heterogeneous databases.
Upon lexical and syntax analysis on the CREATE TABLE statement, the corresponding CreateStmt parsing tree node is obtained, containing table information, column information, distribution information (DistributeBy structure), and partition information (PartitionState structure). After reverse parsing, the result is output in JSON format as follows:
{"JDDL":{"fmt":"CREATE %{persistence}s TABLE %{if_not_exists}s %{identity}D %{table_elements}s %{with_clause}s %{compression}s","identity":{"object_name":"test_create_table_a","schema_name":"public"},"compression":"NOCOMPRESS","persistence":"","with_clause":{"fmt":"WITH (%{with:, }s)","with":[{"fmt":"%{label}s = %{value}L","label":{"fmt":"%{label}I","label":"orientation"},"value":"row"},{"fmt":"%{label}s = %{value}L","label":{"fmt":"%{label}I","label":"compression"},"value":"no"}]},"if_not_exists":"","table_elements":{"fmt":"(%{elements:, }s)","elements":[{"fmt":"%{name}I %{column_type}T","name":"a","column_type":{"typmod":"","typarray":false,"type_name":"int4","schema_name":"pg_catalog"}}]}}}
The output JSON string contains the search path of the object. In the string, the identity key indicates that the schema is public and the table name is test_create_table_a. %{persistence}s corresponds to the following field in the SQL statement (This SQL statement does not contain this field and therefore is empty.):
[ [ GLOBAL | LOCAL ] [ TEMPORARY | TEMP ] | UNLOGGED ]
%{if_not_exists}s corresponds to a field in the SQL statement. (This SQL statement does not contain this field and therefore the field is empty.)
[ IF NOT EXISTS ]
%{identity}D corresponds to the following field in the SQL statement:
table_name
%{table_elements}s corresponds to the following field in the SQL statement:
(column_name data_type)
%{with_clause}s corresponds to the following field in the SQL statement:
[ WITH ( {storage_parameter = value} [, ... ] ) ]
%{compression}s corresponds to the following field in the SQL statement:
[ COMPRESS | NOCOMPRESS ]
- Format specified by decode-style
The output format is controlled by the decode-style parameter. For example, when decode-style is set to 'j', the output format is as follows:
{"TDDL":"CREATE TABLE public.test_create_table_a (a pg_catalog.int4) WITH (orientation = 'row', compression = 'no') NOCOMPRESS"}
The statement also contains the schema name.
API Design
- New control parameters
- Logical decoding control parameters are added to control the DDL statement reverse parsing process and output format. To enable them, you can use the JDBC or pg_logical_slot_peek_changes API.
- enable-ddl-decoding: The default value is false, indicating that logical decoding of DDL statements is disabled. The value true indicates that logical decoding of DDL statements is enabled.
- enable-ddl-json-format: The default value is false, indicating that the DDL statement reverse parsing result is output in text format. The value true indicates that the DDL statement reverse parsing result is output in JSON format.
- A GUC parameter is added.
- enable_logical_replication_ddl: The default value is ON. If the value is ON, logical replication of DDL statements is supported. Otherwise, logical replication of DDL statements is not supported. The DDL statement execution result is reversely parsed and WALs of the DDL statements are generated only when this parameter is set to ON. Otherwise, no reverse parsing is performed and no WAL is generated.
You can check the operation logs of enable_logical_replication_ddl and determine whether logical decoding of DDL statements is not supported because the parameter is modified by users.
- enable_logical_replication_ddl: The default value is ON. If the value is ON, logical replication of DDL statements is supported. Otherwise, logical replication of DDL statements is not supported. The DDL statement execution result is reversely parsed and WALs of the DDL statements are generated only when this parameter is set to ON. Otherwise, no reverse parsing is performed and no WAL is generated.
- Logical decoding control parameters are added to control the DDL statement reverse parsing process and output format. To enable them, you can use the JDBC or pg_logical_slot_peek_changes API.
- New logs
The xl_logical_ddl_message log is added for DDL statements. The log type is RM_LOGICALDDLMSG_ID. The definition is as follows:
Name
Type
Description
db_id
OID
Database ID
rel_id
OID
Table ID
csn
CommitSeqNo
CSN-based snapshot
cid
CommandId
Command ID
tag_type
NodeTag
DDL type
message_size
Size
Length of the log content
filter_message_size
Size
Length of log information filtered by whitelist
message
char *
DDL content
Procedure
- For the logical decoding feature, set the wal_level GUC parameter to logical in advance. This parameter takes effect only after the system is restarted.
gs_guc set -Z datanode -D $node_dir -c "wal_level = logical"
In the preceding command, $node_dir indicates the database node path. Change it based on the actual situation.
- Log in to the primary node of GaussDB as a user with the REPLICATION permission and run the following command to connect to the database:
gsql -U user1 -W password -d db1 -p 16000 -r
In the preceding command, user1 indicates the username, password indicates the user password, db1 indicates the name of the database to be connected, and 16000 indicates the database port number. You can replace them as required.
- Create a logical replication slot named slot1.
1 2 3 4 5
db1=>SELECT * FROM pg_create_logical_replication_slot('slot1', 'mppdb_decoding'); slotname | xlog_position ----------+--------------- slot1 | 0/3764C788 (1 row)
- Create a package in the database.
1 2 3 4 5 6
db1=> CREATE OR REPLACE PACKAGE ldp_pkg1 IS var1 int:=1; -- Public variable var2 int:=2; PROCEDURE testpro1(var3 int); -- Public stored procedure, which can be called by external systems. END ldp_pkg1; /
- Read the decoding result of replication slot 1. You can the JDBC or pg_logical_slot_peek_changes API to update the replication slot number.
- For details about the logical decoding options, see Logical Decoding Options and new control parameters.
- In parallel decoding, you can change the value of the decode_style parameter in the JDBC API to determine the decoding format.
- Configure decode-style to specify the decoding format. The value can be 'j', 't', or 'b' of the char type, indicating the JSON, TEXT, or binary format, respectively.
1 2 3 4 5 6 7
db1=> SELECT data FROM pg_logical_slot_peek_changes('slot1', NULL, NULL, 'enable-ddl-decoding', 'true', 'enable-ddl-json-format', 'false') WHERE data not like 'BEGIN%' AND data not like 'COMMIT%' AND data not like '%dbe_pldeveloper.gs_source%'; data -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- {"TDDL":"CREATE OR REPLACE PACKAGE public.ldp_pkg1 AUTHID CURRENT_USER IS var1 int:=1; -- Public variable\n var2 int:=2;\n PROCEDURE testpro1(var3 int); -- Public stored procedure, which can be called externally.\nEND ldp_pkg1; \n /"} (1 row)
- Delete the logical replication slot slot1 and package ldp_pkg1.
1 2 3 4 5 6 7 8 9
db1=> SELECT * FROM pg_drop_replication_slot('slot1'); pg_drop_replication_slot -------------------------- (1 row) gaussdb=# DROP PACKAGE ldp_pkg1; NOTICE: drop cascades to function public.testpro1(integer) DROP PACKAGE
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot