Sunday, December 8, 2013

Indexes in Oracle Database - Part 1

This is Part 1 in 3 part series. Read Part 2Part 3 here.

In an Oracle Database, indexes are optional structures associated with database tables and clusters, used for speeding up the data retrieval. Just like an index in a book helps finding the information inside the book faster, an index in the oracle database provides a faster access path to the table data. Indexes are independent of the data in the associated table and indexes can be created or dropped without affecting the base table and its data. On the other hand, once created oracle automatically maintains the indexes and any changes in table data are reflected in the index automatically with no additional action from the users manipulating the table data. In order to support large data volumes, indexes are stored in disks using data structures that provide efficient retrieval of table data with minimum disk I/O. 

There are different types of indexes oracle supports and internal storage wise there are mainly 2 types, the b-tree indexes and bitmap indexes. A b-tree index uses a tree data structure called balanced tree (b-tree) to store the data to support quick retrieval of data.  In case of bitmap indexes, oracle stores a bitmap of rowid (unique identifier used to identify a row in Oracle) with distinct values the indexed column has. Here in this 2 blog series I am trying to illustrate the working of B-Tree and Bitmap indexes in Oracle database.

B-tree indexes

B-tree indexes are the standard/traditional tree indexes oracle has been using since its earlier versions, to provide improved speed for data retrieval operations from database tables. Oracle stores the indexes as balanced trees (B-Tree) in the disk in order to minimize the disk I/O while retrieving data. For the columns for which index is created, the data is sorted and stored as rowid – index key value pair for each row.  In oracle, the b-Tree indexes are stored as an ordered list of values divided into block-wide ranges. The end points of the ranges along with the pointers to the blocks are stored as a search tree where a value from a set of N entries can be found in log (n) time.

What is a B-tree ?

Tree structures support various kind of data manipulation operations like search, insert, delete on a set of data in a time proportional to the height of the tree. B-Tree or balanced tree is a variant of the tree data structure which is optimized for situations where it is not possible to keep the entire tree structure in the main memory and require a secondary storage such as disks to maintain the tree. In operations where large volume of data is consumed, systems spend most of the time in waiting for the data to be retrieved from disk which is the most expensive and time consuming of all the operations. A B-Tree stores the data in such way that, the data stored in the tree structure can be retrieved with least number of disk accesses, thus speeding up the data retrieval process.

In a b-tree, each node is designed to accommodate more than one keys and the number of sub trees originating from each node depends on the number keys stored in the node. This design characteristic of b-tree to branch out in number of directions and to store large number of keys in each node helps in keeping height of the tree relatively small which in turn reduces the I/O required while doing operations like searching an element or adding or deleting an element. Every B-tree is said to be of some order “n”, meaning each node in the b-tree can contain “n” to “2n” number of keys.


For example, following diagram (Figure 1) shows a b-tree of order 1 storing numbers from 0 - 9. Being a b-tree of order 1, each node in the tree can have 1 to 2 keys and a maximum of 3 pointers pointing to the sub trees.  On each node keys are kept in sorted order.


Figure 1


In this case any key in the tree can be searched with a maximum of 3 node look up. Here it may not look significant since the example shows a b-tree of order 1 storing 10 keys. In practical scenarios like an oracle index storing hundreds of thousands of keys, b-tree provides large savings in terms of disk I/O, hence speeding up the data retrieval significantly. For example, in a b-tree storing 10,000,000 keys with 50 keys in each node, in order to search a key, a maximum of 4 node look up is good enough.

Oracle’s implementation of b-tree index

The basic storage unit for an oracle database is a ‘data block’; meaning when oracle database processes read or write data from disk it is done for a data block. One data block corresponds to a specific number of bytes of physical database space on disk depending on the data block size mentioned while creating the database. The next level of logical database space is an extent. An extent is a specific number of contiguous data blocks allocated for storing a specific type of information. The level of logical database storage above an extent is called a segment and oracle allocates different type of segments to store different type of objects. When an index is created, index is stored by allocating an index segment in the tablespace specified in the index creation statement.

For B-tree indexes in oracle, each block allocated for the index is considered as a node in the b-tree data structure. B-tree indexes have two types of nodes, leaf nodes and branch nodes. Leaf nodes store the index keys, which is the basic unit of an index, in sorted order together with the rowid information which can be used to locate the row in the database table.  Branch nodes store a reduced list of index keys, and pointers to the child blocks containing the keys.  In branch nodes, oracle does not always store the complete index key, but stores only the minimum key prefix that is required to make a branching decision between two keys.

When an index is created on an empty table, a node is created which is called the root node (which is actually and empty data block) and as data gets inserted in the table, oracle stores index keys in the node in sorted order, along with the rowid information. At this point of time root node acts as a leaf node. A root node at this stage can be represented as in figure 2.
Figure 2

As more keys get added to the index, the root node becomes full and oracle splits the root node to create two leaf nodes to accommodate the additional entries.  While doing so, the root node becomes a branch node storing block address of 2 of the newly created leaf nodes along with first index key stored in those blocks.  All the index keys and their rowid information are distributed to the newly created leaf nodes. Figure 3 represents the state of the b-tree index at this point of time. 
Figure 3




As more and more data gets added to the table, more and more leaf nodes are created and more entries are made in the root node to point to the newly created leaf nodes and at one point, the root node may not have enough space to hold anymore entries to store the block addresses of the leaf nodes that are getting created. The root node is again split to create another level of branch nodes in order to accommodate the growing number of leaf nodes. This process is continued so that all the leaf blocks are maintained at the same depth, and retrieving any record from any part of the index will take roughly the same amount of time. Apart from the index key, rowid pair, leaf nodes also stores block addresses of the adjacent leaf nodes so that the leaf nodes are cyclically linked to each other.

Illustration from Oracle Database

To have a better understanding at Oracle’s way of storing indexes, a sample table called EMP was created and an index EMP_COMM_IDX was created on column COMMENTS which of size 3500 characters. A column of such a big size was chosen to produce an index layout that is spread across multiple leaf/branch nodes with minimum number of records. In this case 8KB being the data block size, Oracle will be able to store only 2 index keys in one data block and with 10 rows itself index will have 5 blocks associated to it. Following table shows the content of EMP table, the data set used in this example. Note that the comments column in the EMP table actually contains character data of size 3500 characters, and only a portion of the data is actually shown in the table given below because of space constraints.

ID
NAME
DEPT
COMMENTS
2
AAA
A
22222222222222222222…..
3
AAA
B
33333333333333333333…..
8
AAA
A
44444444444444444444…..
9
AAA
C
99999999999999999999…..
6
AAA
A
66666666666666666666…..
7
AAA
B
77777777777777777777….
4
AAA
A
88888888888888888888….
5
AAA
B
55555555555555555555….
0
AAA
A
00000000000000000000….
1
AAA
B
11111111111111111111…..

The index was created after populating 10 records in the table with comments column having strings of 3500 characters length.  Shown below is the block dump for the index EMP_COMM_IDX from the database generated using oracle dump utility.

     ----- begin tree dump
branch: 0x3c00b8a 62917514 (0: nrow: 5, level: 1)
   leaf: 0x3c00b8b 62917515 (-1: nrow: 2 rrow: 2)
   leaf: 0x3c00b8c 62917516 (0: nrow: 2 rrow: 2)
   leaf: 0x3c00b8d 62917517 (1: nrow: 2 rrow: 2)
   leaf: 0x3c00b8e 62917518 (2: nrow: 2 rrow: 2)
   leaf: 0x3c00b8f 62917519 (3: nrow: 2 rrow: 2)
----- end tree dump

The dump shows a total of six blocks allocated for the index EMP_COMM_IDX, out of which five are leaf nodes and the other one a branch node. Here each leaf node has 2 index keys stored in it and the branch node contains prefixes for five index keys and their respective child leaf block’s addresses. Note that in branch nodes, oracle does not store the entire index key, but only the prefixes of the index keys that are good enough for making the branching decision. Looking further into each block using oracle’s data block dump utility following diagram was developed with the real block addresses that Oracle has used to represent the index in the data file.

Figure 4
Here the branch node block address is 2954 and leaf nodes starts from block 2955 to 2959. In the branch node Oracle has stored only one character (2 for a string of 3500 2’s for example) for each index key which enough for making the branching decision while searching for a key. Apart from rowid/keys pairs leaf nodes also stores block address for adjacent leaf nodes.

Read Part 2 here

No comments:

Post a Comment