Thursday 31 October 2013

SQL Indexes

What is index?

Indexes in databases are similar to indexes in books. In a book, an index allows you to find information quickly without reading the entire book.
Also it is a list of words with the page numbers that contain each word.

Similarly in a database, an index allows the database program to find data in a table without scanning the entire table and is a list of values in a table with the storage locations of rows in the table that contain each value.

Indexes can be created on either a single column or a combination of columns in a table and are implemented in the form of B-trees.
An index contains an entry with one or more columns (the search key) from each row in a table. A B-tree is sorted on the search key, and can be searched efficiently on any leading subset of the search key.
For example, an index on columns A, B, C can be searched efficiently on A, on A, B, and A, B, C.

The employee table shown below has an index on the emp_id column, following illustration shows; how the index stores each emp_id value and points to the rows of data in the table with each value.

When SQL Server executes a select statement in the employee table based on a specified emp_id value, it recognizes the index for the emp_id column and uses the index to find the data.
If the index is not present, it performs a full table scan starting at the beginning of the table and scaning through each row, searching for the specified emp_id value.

There are two types of Indexes

Clustered


Clustered indexes sort and store the data rows in the table based on their key values. Because the data rows are stored in sorted order on the clustered index key. There can only be one clustered index per table, because the data rows themselves can only be sorted in one order.
The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. If a table has no clustered index, its data rows are stored in a heap.

Indexes are organized as B-trees. Each page in an index holds a page header followed by index rows. Each index row contains a key value and a pointer to either a lower-level page or a data row. Each page in an index is called an index node. The top node of the B-tree is called the root node. The bottom layers of nodes in the index are called the leaf nodes. The pages in each level of the index are linked together in a doubly-linked list. In a clustered index, the data pages make up the leaf nodes. Any index levels between the root and the leaves are collectively known as intermediate levels.

For a clustered index, sysindexes.root points to the top of the clustered index. SQL Server navigates down the index to find the row corresponding to a clustered index key. To find a range of keys, SQL Server navigates through the index to find the starting key value in the range, and then scans through the data pages using the previous or next pointers. To find the first page in the chain of data pages, SQL Server follows the leftmost pointers from the root node of the index.

This illustration shows the structure of a clustered index.

































Nonclustered


Nonclustered indexes have a structure completely separate from the data rows. The lowest rows of a nonclustered index contain the nonclustered index key values and each key value entry has pointers to the data rows containing the key value. The data rows are not stored in order based on the nonclustered key.

The pointer from an index row in a nonclustered index to a data row is called a row locator. The structure of the row locator depends on whether the data pages are stored in a heap or are clustered. For a heap, a row locator is a pointer to the row. For a table with a clustered index, the row locator is the clustered index key.

Nonclustered indexes can be defined on a table with a clustered index, a heap, or an indexed view; the row locators in nonclustered index rows have two forms:
  • If the table is a heap (does not have a clustered index), the row locator is a pointer to the row. The pointer is built from the file identifier (ID), page number, and number of the row on the page. The entire pointer is known as a Row ID.
  • If the table does have a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server 2000 makes duplicate keys unique by adding an internally generated value. This value is not visible to users; it is used to make the key unique for use in nonclustered indexes. SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index.
Because nonclustered indexes store clustered index keys as their row locators, it is important to keep clustered index keys as small as possible. Do not choose large columns as the keys to clustered indexes if a table also has nonclustered indexes.



































About Author:
Harshad Pednekar is budding technology geek, who actively contributes to Systems Plus with his creativity and research on technology. To read more interesting articles from him , please follow: http://harshadpednekar.blogspot.in

No comments:

Post a Comment