Tuesday, 21 May 2013

Indexes

  Indexes are used to speedup data access in the database. By using them one can quickly find the data in a table without having to read all the data from the tables. The index structure resembles an inverse tree similar to a directory structure. This tree begins with the 1st page of an index which is the root node which contains pointers to other pages in the index. Then comes the intermediate node or branch node which also contains pointers to leaf nodes or other branch nodes. The leaf node is the lowest level page in an index and it contains either an Row identifier (RowId) that points to the actual data in a table or it may even contain a clustering key itself. The following B-Tree (Balanced Tree) is a sample of an index page containing 1000 records and how they are separated into Root node, Branch node and Leaf node. Indexes_01 Leaf node Consider the above B-Tree structure and we are searching a field with value of 978. First the database engine would check the root node value of 1 to 1000. Since the value is 978 it uses the branch node 501 to 1000 and navigates further to reach 751 to 1000. Finally it would navigate further through the leaf node 876 to 1000 and find the desired record 978. Thus by using the index there were only fewer number of reads. When compared to scanning all the 1000 records to fetch the result, this usage of index has reduced the IO to a large extent. Clustered Index: A clustered index stores the table data at the leaf page of the index based on clustering key. Because the clustered index stores the data in sorted order there is no need to rearrange the page in the index. There can be only one clustered index in a table since the data is sorted. Consider the analogy of a dictionary where the words are sorted alphabetically and definitions appear next to the word. Similarly in a clustered index CI, the leaf page contains the entire data/records and is sorted. NonClustered Index: A nonclustered index is analogous to an index in the back of a book. We can use the books index to locate pages that match an index entry. There can be 249 nonclustered index NCI and 1 clustered index in any table. If table does not have a clustered index, then its unsorted and is called a HEAP. A NC created on a Heap contains pointers to table rows. Each entry in the index page contains a row ID (RID). The RID is a pointer to a table row in a heap. If the table has clustered index, the index pages of a NCI contain the CI keys rather than RIDs. An index pointer whether it is a RID or a CI key is called a lookup. Scan and Seek: The following are the operators related to indexes. They are available in the SQL Server query execution plans. Table scan – A table scan results in reading the entire datas in a table and returns the entire table or specific records. This is bad for performance, if the table has numerous records doing a table scan will affect the performance severely. In some cases if there are fewer records its fine to have table scan. So if you see that SQL Server has performed a Table Scan, take a note of how many rows are in the table. If there arent many, then in this case, a Table Scan is a good thing. In the below query in adventureworks database we dont have an index on customerid field and it results in table scan. Also when a Select * is done in a table it will fetch the entire results and hence will do a table scan.
1
2
SELECT * FROM sales.storecontact
SELECT * FROM sales.storecontact WHERE customerid=322
Indexes_02 Clustered Index scan This is nothing but tables scan in a table which has Clustered index. Since the clustered index leaf page contains the data itself, performing a clustered index scan will scan all the entire records which will affect performance. But as I mentioned earlier if the records are fewer it wouldnt affect much. For the below query the optimizer does an Clustered index scan to retrieve the records,
1
SELECT * FROM sales.storecontact WHERE contacttypeid=15
Indexes_03 Clustered index seek A seek will retrieve only selective records from a table when compared to scan which will traverse all the records in a table. If the seek operation is done using a clustered index its a clustered index seek. Basically, any seek operation will vertically traverse through the B-Tree and fetch the records. Consider the below query for which I created a clustered index on Customerid field. The optimizer uses the clustered index and performs a clustered index seek.
1
SELECT * FROM sales.storecontact WHERE customerid=322
Indexes_04 The clustered index seek will traverse through the records where the customerid=322 and fetch the output. When compared to table scan which will traverse through all the records, an index seek is very helpful in reading the number of records quickly and is good for performance. Index scan Since a scan touches every row in the table whether or not it qualifies, the cost is proportional to the total number of rows in the table. An index scan is nothing but a scan on the nonclustered index. When index scan happens, all the rows in the leaf level are scanned. Index seek An index seek uses the nonclustered index to seek the records in a table and is considered better for performance if there is high selectivity. For the below query the optimizer does an index seek using the NC index on contacted field. Since the NC covers only the contactid it will not be able to fetch all the records with an index seek alone. So its uses seek to fetch the records which have contactid=322 and then does a key lookup using the clustered index key to fetch the other fields records.
1
SELECT * FROM sales.storecontact WHERE contactid=322
Indexes_05 The key lookup is an expensive operation if there are numerous records. Since key lookup increases as the IO we might have to avoid it in some cases. Index with included columns can help to overcome this situation and cover the entire query and in turn

No comments:

Post a Comment