Column Store Index, which improves performance of data warehouse queries several folds, was first introduced in SQL Server 2012. Unlike regular indexes or heaps, which store data in B-Tree structure (in row-wise fashion), the column store index stores data in columnar fashion and uses compression aggressively to reduce the disk I/O needed to serve the query request. Apart from the benefits it provides, it has several limitations in SQL Server 2012. SQL Server 2014 enhances the columnstore index and overcomes several of the earlier limitations, most significant of which is the fact that they trip their underlying table into read-only mode and the fact that in SQL Server 2014 allows for creation of clustered columnstore indexes.
Just like a normal clustered index, a clustered columnstore index defines how the data is physically stored on the disc. A columnstore backed table is initially organized into segments known as row groups. Each rowgroup holds from 102,400 to 1,048,576 rows. Once a rowgroup is identified it is broken up into column segments, which are then compressed and inserted into the actual columnstore.When dealing with small amounts of data, small being defined as less than a hundred thousand rows, the data is staged in a section known as the deltastore. Once it reaches the minimum size the deltastore can be drained, its data being processed as a new rowgroup. You can see this illustrated in the MSDN diagram below:
A deltastore will be closed while it is being converted. This, however, is not a blocking operation for the table as a whole. An additional deltastore can be created for a given table when the current deltastores are unavailable due to locking. And if the table is partitioned, then each partition gets its own set of deltastores. Unlike the previous version of columnstore indexes, the clustered version must include all columns in the table. This is because there is no other heap or clustered index to fall back on for the rest of the row. In fact, clustered columnstore indexes cannot be combined with other types of indexes at all.
So why would we do this at all? Columnstore indexes work well for data warehousing workloads that primarily perform bulk loads and read-only queries. Use the columnstore index to achieve up to 10x query performance gains over traditional row-oriented storage, and up to 7x data compression over the uncompressed data size.
So lets try test this out on smaller scale. First we will create a database I called my_cs_test
USE master GO CREATE DATABASE [my_cs_test] CONTAINMENT = NONE ON PRIMARY ( NAME = N'my_cs_test', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.GENNADYKSQL2K14\MSSQL\DATA\my_cs_test.mdf' , SIZE = 4096KB , FILEGROWTH = 1024KB ) LOG ON ( NAME = N'my_cs_test_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.GENNADYKSQL2K14\MSSQL\DATA\my_cs_test_log.ldf' , SIZE = 1024KB , FILEGROWTH = 10%) GO
Next lets create a table and load some data into it: Lets create sample table called accounts:
USE my_cs_test GO CREATE TABLE Accounts( accountkey int not null, accountdescription nvarchar (50), accounttype nvarchar(50), AccountCodeAlternatekey int)
Now lets create a a regular clustered index on account key as unique column here.
CREATE CLUSTERED INDEX clx_accounts ON [dbo].[Accounts] ( [accountkey] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) GO
Now lets load our table with sample data:, I am picking 100,000 rows:
begin tran declare @i int = 0 while (@i < 100000) begin insert into Accounts values (@i, 'description', 'dummy-accounttype', @i*2) set @i = @i + 1 end commit
Lets confirm , yes we have 100,000 rows here –
Now lets do a typical select , we should be performing quite well here with this query as we are seeking right on clustered index, so we will see a clustered index seek returning 68001 rows very fast, check stats below, note elapsed time is only 371 ms and CPU Time 47 ms:
Select [accountdescription], [accounttype] from Accounts where [accountkey] between 5000 and 73000
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms. (68001 row(s) affected) Table 'Accounts'. Scan count 1, logical reads 693, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. (1 row(s) affected) SQL Server Execution Times: CPU time = 47 ms, elapsed time = 371 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.
Ok that was pretty good. Now lets try same with clustered columnstore index. First I will truncate table and drop existing clustered index
USE [my_cs_test] GO TRUNCATE TABLE [Accounts] GO DROP INDEX Accounts.clx_accounts
Lets create clustered columnstore index here and that as I stated above will include all columns on this table:
USE [my_cs_test] GO CREATE CLUSTERED COLUMNSTORE INDEX [clx_Accounts] ON [dbo].[Accounts] WITH (DROP_EXISTING = OFF) GO
Now using above routine I will load same 100,000 rows. Note this isn’t probably something you will do in data warehousing environment really, as you will probably use bcp or SSIS to load data into your warehouse, but for the purpose of our test it will suffice. Now note we are back to the same 100,000 rows:
And lets fire up now the same query. As you remember it was very fast and seek operation to begin with, so I really had doubts that we could do better. Nevertheless my logical reads and CPU time has fallen even further and even execution time got cut:
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms. (68001 row(s) affected) Table 'Accounts'. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. (1 row(s) affected) SQL Server Execution Times: CPU time = 32 ms, elapsed time = 177 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.
Obviously this isn’t a type of query where Columnstore index would make huge difference, but I used as tutorial to create and test clustered columnstore index. In data warehousing scenarios where large number of rows are being scanned by queries I noted huge difference using this in-memory columnar technology vs. old row based indexes. Oh yes, one more thing, clustered columnstore indexes are available as feature in Enterprise and Developer versions of SQL Server 2014.
For more information see – http://msdn.microsoft.com/en-us/library/gg492088.aspx, http://social.technet.microsoft.com/wiki/contents/articles/3540.sql-server-columnstore-index-faq.aspx, http://www.serioussqlserver.com/sql-2014/sql-2014-columnstore-indexes/, and http://blogs.askcts.com/2014/06/27/sql-server-2014-columnstore-indexes/