RCR UB-Tree

RCR UB-Tree Multi-Version Management

The multi-version management of row consistency read (RCR) UB-tree is based on data row levels. XIDs are recorded in data rows, which increases the key size and causes index expansion by 5% to 20%. The latest and historical versions are on the B-tree, and the index does not record the undo information. Keys are inserted into or deleted in the sequence of key + TID. Tuples with the same index column are sorted based on their TIDs as the second keywords. The xmin and xmax are added to the end of the key. During index splitting, multi-version information is migrated with key migration, as shown in Figure 1.

Figure 1 RCR UB-tree multi-version management
Click to enlarge

RCR UB-Tree Visibility Mechanism

RCR UB-tree visibility is determined by xmin/xmax on the key, which is similar to xmin/xmax on Astore heap table data rows, as shown in .Figure 2

Figure 2 RCR UB-tree Visibility Mechanism
Click to enlarge

Adding, Deleting, Modifying, and Querying RCR UB-Tree

Insert: The insertion logic of UB-tree is basically not changed, except that you need to directly obtain the transaction information and fill in the xmin column during index insertion.
Delete: The index deletion process is added to the UB-tree. The main procedure of index deletion is similar to that of index insertion. That is, obtain the transaction information, fill in the xmax column, and update active_tuple_count on pages. If the value of active_tuple_count is reduced to 0, the system attempts to recycle the page. (The B-tree index does not maintain the version information and therefore no deletion operation is required.)
Update: In Ustore, data updates require different processing on UB-tree index columns compared to Astore. Data updates include index column updates and non-index column updates. Figure 3 illustrates the UB-tree processing during data updates.
Figure 3 UB-tree processing during data updates

Figure 3 shows the differences between UB-tree updates in index columns and non-index columns.
- In the case of non-index column updates, the index does not change. The index tuple still points to the data tuple inserted for the first time. No data tuple is inserted to the Uheap. Instead, the Uheap modifies the current data tuple and saves historical data to the undo segment.
- In the case of index column updates, a new index tuple is inserted into UB-tree and points to the same data line pointer and data tuple. To scan the historical data, you need to read it from the undo segment.
Scan: When reading data, you can use index scan to accelerate data read. The UB-tree supports multi-version management and visibility check of index data. The visibility check at the index layer improves the performance of index scan and index-only scan.
For index scan:
- If the index column contains all columns to be scanned (index-only scan), binary search is performed on indexes based on the scan conditions. If a tuple that meets the conditions is found, data is returned.
- If the index column does not contain all columns to be scanned (index scan), binary search is performed on indexes based on the scan conditions to find TIDs of the tuples that meet the conditions, and then the corresponding data tuples are found in data tables based on the TIDs, as shown in Figure 4.
  Figure 4 Corresponding data tuple

RCR UB-Tree Space Management

Currently, Astore indexes depend on AutoVacuum and Free Space Map (FSM) for space management, which may not be recycled in a timely manner. Ustore indexes use UB-tree Recycle Queue (URQ), a data structure based on cyclic queues, that is, dual-loop queues, to manage idle index space. The URQ contains two circular queues: potential empty page queue and available empty page queue. Completing space management of indexes in a DML process can effectively alleviate the sharp space expansion caused during the DML process. The index recycling queue is separately stored in the FSM file corresponding to the B-tree index.

Figure 5 Index pages flow in the URQ
Click to enlarge

As shown in Figure 5, the index pages flow in the URQ as follows:

From an empty page a potential queue
The index page tail column records the number of active tuples (activeTupleCount) on the page. During the DML process, all tuples on a page are deleted, that is, when activeTupleCount is set to 0, the index page is placed in the potential queue.

From the potential queue to an available queue
The flow from a potential queue to an available queue mainly achieves an income and expense balance for the potential queue and ensure that pages are available for the available queue. That is, after an index empty page is used up in an available queue, at least one index page is transferred from a potential queue to the available queue. Besides, if a new index page is added to a potential queue, at least one index page can be removed from the potential queue and inserted into the available queue.
From the available queue to an empty page
When obtaining an empty index page, the index first searches the available queue for an index page that can be reused. If a page is found, the page is reused. If no page can be reused, physical page extension is performed.

Parent topic: UB-Tree

Previous topic: UB-Tree

Next topic: PCR UB-Tree