SQL Server – Clustered vs. Non-clustered Indexes

A question was asked on LinkedIn regarding the difference between clustered and none-clustered indexes. Since LinkedIn limits the characters you can enter, I’m posting my (hastily written) reply in blog form.

shaun-q

shaun-a

I often explain it to SQL newbies using the example of a dictionary vs. a store catalogue.

Let’s say you pick up a copy of the Oxford English Dictionary and you want to find the definition of a word e.g. ‘Database’. Because all the information in the dictionary is sorted in a helpful order, i.e. alphabetically, then you can just scan through the dictionary, in order, and then stop at ‘D’. When you find it, you not only find the word ‘Database’, but also some information associated with the word, such as an explanation of what it means, and maybe a synonym. Your query has been fully satisfied. Happy days!

That is a clustered index in a nutshell. It’s your data, physically sorted based on the index key, which in this case, was the name of the word. In reality, the clustered index key is usually the primary key, which will often be an integer, although certainly doesn’t have to be. Since the clustered index physically sorts our data, we can only have 1 per table, although it could contain multiple columns.

Now let’s imagine you want to buy a birthday present for your better-half. You pop into a store and pick up a thick product catalogue. The store catalogue is sorted by product category (furniture, homeware, electricals etc.) and each product within the category includes the full description, price, pictures and so on. Here, our clustered index is on the product category.

However, you’re only interested in gifts that are priced under $10 (you’re a generous soul!). How would you find these products efficiently? It could take you forever to scan the entire store catalogue and find every product under $10. This search would be very inefficient, and you’d more than likely get bored and give up.

The good news is that the store have included an index at the back of their catalogue. This lists every product, but in this index they’re sorted by price from low to high. This means you can scan the products in price order and very quickly see all the products you’re interested in, never even reading the ones you’re not!

There’s a slight downside to this approach, which is that when you scan the price index, it doesn’t contain all the other information you’re after, such as the pictures of the product and its full description. If they were included as well, the price index would be almost as big as the catalogue itself! Instead, our price index contains the price (our index key), the name of the product (an included column), and crucially, a pointer to the page where the rest of the information (e.g. our pictures) can be physically found – in our clustered index!

That is a simple non-clustered index in a nutshell. It creates an additional index which allows SQL Server to efficiently search for the data pages needed based on the search predicate provided in the query – in our example, the price. Also, because these indexes are separate from our physical table, we can have many non-clustered indexes on one table. Be aware however, that each index will make the overall size of your database bigger (the index has to be stored somewhere after all!) and there is a impact on your DML statements since the index has to be maintained.

This is a somewhat rushed explanation, but I hope that it helps!

Follow Shaun on Twitter for more programming wisdom – and occasional sports commentary!