Updated on 2025-03-13 GMT+08:00

Merge Join

Description

A merge join is an efficient join method that depends on sorting operations. During a merge join, GaussDB sorts the join columns of the two tables and scans both tables at the same time to find matching rows. The time complexity of a merge join is O(n + m), where n and m indicate the numbers of rows in the two tables. However, if a sorting operation is required, the time complexity of the sorting operation may reach max(O(log n), O(log m)), which is usually more time-consuming than a direct merge join operation. In GaussDB, the optimizer prefers hash join, even if the two tables to be joined have been sorted.

Typical Scenarios

  • The two tables are similar in size.
  • The join columns of two tables have been sorted, for example, by index.
  • In addition to the equi-join (for example, using the = operator), the join condition also includes the range query (for example, using the <, <=, >, and >= operators).

Examples

Example: Two tables are joined and the sizes of the two tables are close.

-- Prepare data.
gaussdb=# CREATE TABLE t1(id int,info text); 
CREATE TABLE 
gaussdb=# CREATE TABLE t2(id int,info text); 
CREATE TABLE 
gaussdb=# INSERT INTO t2 SELECT generate_series(1,100000),'bill'||generate_series(1,100000); 
INSERT 0 100000 
gaussdb=# INSERT INTO t1 SELECT generate_series(1,100000),'bill'||generate_series(1,100000); 
INSERT 0 100000

-- Execution result.
gaussdb=# EXPLAIN SELECT /*+ MERGEJOIN(t2 t1) */ * FROM t2 JOIN t1 ON (t1.id=t2.id); 
                              QUERY PLAN                               
-----------------------------------------------------------------------
 Merge Join  (cost=19421.64..21421.64 rows=100000 width=26)
   Merge Cond: (t2.id = t1.id)
   ->  Sort  (cost=9710.82..9960.82 rows=100000 width=13)
         Sort Key: t2.id
         ->  Seq Scan on t2  (cost=0.00..1406.00 rows=100000 width=13)
   ->  Sort  (cost=9710.82..9960.82 rows=100000 width=13)
         Sort Key: t1.id
         ->  Seq Scan on t1  (cost=0.00..1406.00 rows=100000 width=13)
(8 rows)

-- Drop.
gaussdb=# DROP TABLE t1,t2;

In the preceding example, the output of the merge join operator is as follows.

Item

Description

Merge Join

Operator name.

Merge Cond

Join predicate of the operator merge join. In the example, the condition is that t2.id is equal to t1.id. During query execution, rows that meet these conditions are included in the final result set.