Alalia Voce In Mari Confabulationes– Tackling Controversy of SQL Server 5 Minute In-Memory Rule

When I thought about blogging on this topic I knew that I would be opening a rather controversial issue. Through many years dealing with customer issues and with guidance on many Microsoft SQL Server based products I had to defend old 5 minute in-memory rule and for good reasons. First, lets understand what is 5 minute in-memory rule, why is it controversial with some folks, although it definitely shouldn’t be in my opinion, and why it still matters.

The 5 minute rule was first described long ago in a cult paper, which I believe should be part of every computer science degree (at least it always was in Berkeley, MIT and NYU\Polytechnic) written by Jim Gray and Franco Putzolu long ago – in 1985. The paper is available from Berkeley site at – http://db.cs.berkeley.edu/cs286/papers/fiveminute-tr1986.pdf. Essentially what it states is to  cache randomly accessed disk pages that are re-used every 5 minutes or less.The original 5-minute rule was derived from the following cost-benefit computation:

BreakEvenIntervalinSeconds = (PagesPerMBofRAM / AccessesPerSecondPerDisk) × (PricePerDiskDrive / PricePerMBofRAM)

What does it mean in practice for SQL Server? Well, SQL Server always contained great basic performance counter metric on the instance level as part of Buffer Manager object called Page Life Expectancy (commonly shortened to PLE in SQL Server community).

image

Like all database servers, Microsoft SQL Server thrives on memory. Each instance of Microsoft SQL Server has its own memory address space. The SQL Server address space (Virtual memory allocation by Windows) consists of two main components: executable code and memory pool.

In our case we are mainly interested in memory pool and its largest component – Buffer Cache.

buffer2

The memory pool is the unit of memory that an instance of SQL Server uses to create and manage data structures relating to client requests. Buffer Cache is the pool of memory pages into which data pages are read to satisfy requests. One of the primary design goals of all database software is to minimize disk I/O because disk reads and writes are among the most resource-intensive operations. SQL Server builds a buffer pool in memory to hold pages read from the database. As it always a lot more performing to serve page from Buffer Cache than from Disk here where 5 minute rule described above and PLE metrics become paramount for SQL Server performance. The minimum recommended value for PLE is just that – 300 seconds (5 minutes). In simple words, if your page stays longer in the buffer pool (area of the memory cache) your PLE is higher, leading to higher performance as every time request comes there are chances it may find its data in the cache itself instead of going to hard drive to read the data. You can find your PLE value also from SQL Server DMV query in addition to Performance Monitor:

SELECT [object_name],
[counter_name],
[cntr_value]
FROM sys.dm_os_performance_counters
WHERE [object_name] LIKE '%Manager%'
AND [counter_name] = 'Page life expectancy'

As you can see on my legacy SQL Server 2008 instance result is pretty high as a have plenty of memory dedicated to SQL Server Buffer via max_server_memory and not much traffic at all:

image

So why the controversy? Well, since the 5 minute rule is now close to 30 years old there are some revisions that have been published to this classic.

In September 1997, group from MS Research, including Jim Gray published “The Five-Minute Rule Ten Years Later,and Other Computer Storage Rules of Thumb” technical report – http://db.cs.berkeley.edu/cs286/papers/fiveminute-tr1997.pdf  Actually this paper reinforced 5 minute rule stating – “These calculations indicate that the reference interval of Equation (1) is almost unchanged, despite these 10x, 100x, and 1,000x changes. It is still inthe 1-minute to 10-minute range. The 5-minute rule still applies to randomly accessed pages.”

However, as time went by various customers reading various blogs on how PLE is old started questioning classic >300 PLE rule. Statements also from various customers included following – “

“We were seeing SCOM alerts below 300 but our environment was running well after verifying…we noticed that we only saw a problem with disk after the PLE went below 100 and specifically around the 80 mark.” .  Customers also mentioned well respected Pinal Dave blog, which I love to read as well – http://blog.sqlauthority.com/2014/05/01/sql-server-good-value-for-page-life-expectancy-notes-from-the-field-026/.  Issue is no one really read his blog carefully where he states: “

A very old and dated recommendation is that this value be 300 or higher which is only 5 minutes.  With the standard servers in place today, this value is too low….”

I don’t disagree that he is right,  since todays servers have much more memory than previously dedicated to the buffer and you theoretically should see a lot higher PLE. But 5 minute rule is an absolute critical minimum still with one possible exception. The exception is Flash Memory. Flash memory is an electronic non-volatile computer storage medium that can be electrically erased and reprogrammed.  I worked with arrays from some of the leading flash vendors including Violin Memory (http://pull.vmem.com/wp-content/uploads/Violin-Solution-Brief-Windows-Flash-Array-SQL.pdf) and Fusion-IO (http://www.fusionio.com/press-releases/fusion-io-unlocks-performance-for-sql-server-2014) and can attest to speed and throughput increases using Flash. Finally, HP Labs whitepaper titled – “The five-minute rule twenty years later,and how flash memory changes the rules” , that can be downloaded at http://db.cs.berkeley.edu/cs286/papers/fiveminute-damon2007.pdf shows that although old 5 minute rule is still generally valid  , with ascension of Flash it can be used to fill the gap between traditional RAM memory and traditional disk drives in many operating systems, file systems, and database systems.

So controversies aside, the old 5 minute rule is still here. Now, I would recommend augment your monitoring of PLE with a couple of related metrics for more accuracy, which include BufferManager: Buffer Page Hit Ratio and BufferManager: Free Pages to get a better and all round picture of your buffer memory utilization. For more see here – https://www.sqlskills.com/blogs/jonathan/the-accidental-dba-day-21-of-30-essential-perfmon-counters/

Advertisements

SQL Server Quick Tasks – Setup Extended Event Trace For Monitoring

In my previous post I shown how easy it is to setup a server side trace to monitor for deadlocks. Server side trace is an old vintage DBA solution for all kinds of monitoring on SQL Server. However lets welcome a new weapon to the arsenal – SQL Server Extended Events that are intended to replace Profiler tracing. These were introduced in SQL Server 2008, but really came into their own with SQL Server 2012 with thousands of events and many categories. SQL Server Extended Events are built on Event Tracing for Windows (ETW) infrastructure that is taking Windows troubleshooting world by storm and are much more lightweight alternative to legacy SQL Server Trace. In SQL Server 2008, Extended Events can only be configured but using T-SQL and no user interface was available. SQL Server 2012, adds a user interface for configuring Extended Events. And evolution continues – The number of Extended Events increased from 625 in SQL Server 2012 (SP1) to 870 in SQL Server 2014 (RTM). That makes 245 new events added between versions.

Good introduction to SQL Extended Events is available in BOL.  There are also excellent posts on extended events available from Jonathan Kehayias on SQLSkills that goes way deeper than this post.

One of my customer’s DBA teams was doing pretty much what many good DBA teams do – monitoring SQL Server backend of large commercial application using a rolling server side SQL Trace.  However as application continued to grow in terms of database traffic and was migrated to SQL Server 2012 it was judged that time came to migrate SQL Trace to an Extended Event capture for obvious reasons.  That took a bit of research, but wasn’t as painful as I initially expected, below makes for an example of Extended Events capture that captures good events for analysis – rpc:completed and batch:completed, yet isn’t expensive and crazy enough to attempt a statement level capture in production system due to obvious impact on performance.

First thing that we will do is CREATE EVENT SESSION , this will create event session, identify targets we wish to monitor, etc. See BOL for complete command.

CREATE EVENT SESSION xe_Test  ON SERVER
   ADD EVENT sqlserver.rpc_completed

   (ACTION
   (    Sqlserver.sql_text, -- text
   package0.collect_system_time,
   sqlserver.username,
   sqlserver.client_app_name, 
   sqlserver.client_hostname, 
   sqlserver.database_id, 
package0.collect_cpu_cycle_time )
   ),

 

So as you can see here I am capturing rpc:completed event that occurs as stored procedure execution completes, get execution time, user name under which it ran, application that sent it, client hostname and database id where execution occurred in SQL Server instance.

But I also want sql: batchcompleted event for all of the traffic that doesn’t come via rpc and instead executed as SQL Server batch. Let me add this event:

   ADD EVENT sqlserver.sql_batch_completed

   (ACTION
   (    Sqlserver.sql_text, -- text
   package0.collect_system_time,
   sqlserver.username,
   sqlserver.client_app_name, 
   sqlserver.client_hostname, 
   sqlserver.database_id,
  package0.collect_cpu_cycle_time )
   )

Now that I got what I wanted to capture let me identify target – where we will capture that information into.  There are multiple targets that Extended Events support as per BOL, including ETW, Ring Buffer, etc. In my case I will use a pretty standard way to save my info – a nice local file, just as I would with SQL Trace running server side.

ADD TARGET package0.asynchronous_file_target
      (set filename = 'test.xel',
      metadatafile = 'test.xem');

Now as I am done building this capture definition I can manually start this capture

--Start the trace that we setup above.
ALTER EVENT SESSION  xe_Test on server state = start;

To start every time we start SQL automatically use this command

ALTER EVENT SESSION  xe_Test  on server with (STARTUP_STATE = ON);
go

Now that everything is working and logging to a file, now we need to get relevant data from the file, I used a CTE , but you can throw that data into a real SQL table as well. I will use sys.fn_xe_file_target_read_file to read XML from the file:

SELECT CAST(event_data as XML) AS edx
   FROM sys.fn_xe_file_target_read_file( 'Test*.xel', 'Test*.xem', NULL, NULL)
   ) 
   , c as (
   select

      edx.value('(//action/value)[1]','varchar(max)') as [SQL],
      edx.value('(//action/text)[2]','varchar(max)') as [time],
      edx.value('(//action/value)[3]','varchar(max)') as [Login],
      edx.value('(//action/value)[4]','varchar(max)') as [Application],
     edx.value('(//action/value)[5]','varchar(max)') as HostName,
      edx.value('(//action/value)[6]','varchar(max)') as DatabaseID,
    edx.value('(//action/value)[7]','varchar(max)') as [CPU]
   from x
   )
   --Coordinated Universal Time (UTC)
   --Greenwich , England
   select dateadd(HH,-7,[time]) [Time] as utc_time
   , SQL
   , HostName 
   , [LOGIN]
   , [Application] 
   , [CPU] as cpu
   , db_name([DatabaseID]) DatabaseName
   from c

How do I stop this capture? Simple , I will use ALTER EVENT SESSION command again. Let me first see if my capture is running, I can use DMV to see that:

SELECT * FROM sys.dm_xe_sessions;

If it is lets stop it.

ALTER EVENT SESSION xe_Test
ON SERVER
STATE = stop

So, although when I started with XE (Extended Events) I thought these were unnecessarily difficult for your average DBA they have grown on me and I used this way to troubleshoot SQL Server now quite a bit. Hope above is useful for your troubleshooting and monitoring endeavors. 

SQL Server Quick Tasks – Setup deadlock event trace with T-SQL

Some folks may not know that there is an easy way to setup and monitor via server side trace from T-SQL, although this functionality has been in SQL Server for many years. In SQL Server 2012 and above there is a newer method of monitoring for events known as XEvents – http://technet.microsoft.com/en-us/library/bb630354(v=SQL.105).aspx, however very recently I had to setup a little trace with deadlock graph events to analyze occasional deadlock occurrences on SQL Server 2008 where xEvents , although exist are rather limited.

Therefore I decided to go “old school” and setup a little trace to get deadlock information. Doing it through SQL Server Profiler GUI is easy, however so is through T-SQL.   In this trace I will only capture deadlock events such as Lock: Deadlock, Lock: DeadlockGraph, Lock: DeadlockChain.

SQL Trace provides a set of stored procedures to create traces on an instance of the SQL Server Database Engine. These system stored procedures can be used from within user’s own applications to create traces manually, and allows to write custom applications specific to their needs.

First lets take a, look at current server side traces present on my instance:

select * from sys.traces 

As to be expected I have SQL Server default trace running under id of 1:

 

image

 

Now lets then create our own trace here:

declare @tracefile nvarchar(500) set @tracefile=N'c:\temp\deadlock_trace.trc'

declare @trace_id int

declare @maxsize bigint

set @maxsize =100

exec sp_trace_create @trace_id output,2,@tracefile ,@maxsize 

go

 

--- add the events of insterest to be traced, and add the result columns of interest

--  Note: look up in sys.traces to find the @trace_id, here assuming this is the first trace in the server, therefor @trace_id=1



declare @trace_id int

set @trace_id=2

declare @on bit

set @on=1

declare @current_num int

set @current_num =1

while(@current_num <65)

      begin

      --add events to be traced, id 59 is the Deadlock Chain event, 148 deadlock graph event,25 Lock Deadlock  you add other events per your own requirements, the event id can be found @ BOL http://msdn.microsoft.com/en-us/library/ms186265.aspx



      exec sp_trace_setevent @trace_id,59, @current_num,@on
      exec sp_trace_setevent @trace_id,148, @current_num,@on
      exec sp_trace_setevent @trace_id,25, @current_num,@on
      
      

      set @current_num=@current_num+1

      end 

go

Ok, above we started trace with max size of 100 MB, monitoring just deadlock events, althrough we of course could use number of other events as listed in BOL for sp_trace_setevent – http://msdn.microsoft.com/en-us/library/ms186265.aspx .

Now running query on sys.traces returns both default and our trace here:

image

Well, that’s all great, but how do I turn off this trace? Simple. From above we know that our trace id is 2. Therefore couple of calls below will turn off and deactivate this trace via sp_trace_setstatus system stored procedure – http://msdn.microsoft.com/en-us/library/ms176034.aspx

declare @trace_id int

set @trace_id=2
exec sp_trace_setstatus @trace_id,0 
exec sp_trace_setstatus @trace_id,2 

go

Now we are back to default trace only running again, as to be expected.  Hope this little “old school” tip helps someone and plan to blog deeper on new xEvent functionality with SQL Server 2012 and 2014 soon.

Memoratus In Aeternum – Clustered Columnstore Indexes feature in SQL Server 2014

Column Store Index, which improves performance of data warehouse queries several folds, was first introduced in SQL Server 2012. Unlike regular indexes or heaps, which store data in B-Tree structure (in row-wise fashion), the column store index stores data in columnar fashion and uses compression aggressively to reduce the disk I/O needed to serve the query request. Apart from the benefits it provides, it has several limitations in SQL Server 2012. SQL Server 2014 enhances the columnstore index and overcomes several of the earlier limitations, most significant of which is the fact that they trip their underlying table into read-only mode and the fact that in SQL Server 2014 allows for creation of clustered columnstore indexes.

Just like a normal clustered index, a clustered columnstore index defines how the data is physically stored on the disc. A columnstore backed table is initially organized into segments known as row groups. Each rowgroup holds from 102,400 to 1,048,576 rows. Once a rowgroup is identified it is broken up into column segments, which are then compressed and inserted into the actual columnstore.When dealing with small amounts of data, small being defined as less than a hundred thousand rows, the data is staged in a section known as the deltastore. Once it reaches the minimum size the deltastore can be drained, its data being processed as a new rowgroup. You can see this illustrated in the MSDN diagram below:

image

A deltastore will be closed while it is being converted. This, however, is not a blocking operation for the table as a whole. An additional deltastore can be created for a given table when the current deltastores are unavailable due to locking. And if the table is partitioned, then each partition gets its own set of deltastores. Unlike the previous version of columnstore indexes, the clustered version must include all columns in the table. This is because there is no other heap or clustered index to fall back on for the rest of the row. In fact, clustered columnstore indexes cannot be combined with other types of indexes at all.

So why would we do this at all? Columnstore indexes work well for data warehousing workloads that primarily perform bulk loads and read-only queries. Use the columnstore index to achieve up to 10x query performance gains over traditional row-oriented storage, and up to 7x data compression over the uncompressed data size.

So lets try test this out on smaller scale. First we will create a database I called my_cs_test

USE master
GO
CREATE DATABASE [my_cs_test]
 CONTAINMENT = NONE
 ON  PRIMARY 
( NAME = N'my_cs_test', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.GENNADYKSQL2K14\MSSQL\DATA\my_cs_test.mdf' , SIZE = 4096KB , FILEGROWTH = 1024KB )
 LOG ON 
( NAME = N'my_cs_test_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.GENNADYKSQL2K14\MSSQL\DATA\my_cs_test_log.ldf' , SIZE = 1024KB , FILEGROWTH = 10%)
GO

Next lets create a table and load some data into it: Lets create sample table called accounts:

USE my_cs_test
GO
CREATE TABLE Accounts(

       accountkey                 int not null,

       accountdescription          nvarchar (50),

       accounttype                nvarchar(50),

       AccountCodeAlternatekey    int)

Now lets create a a regular clustered index on account key as unique column here.

CREATE CLUSTERED INDEX clx_accounts ON [dbo].[Accounts]
(
	[accountkey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

GO

Now lets load our table with sample data:, I am picking 100,000 rows:

begin tran

declare @i int = 0

while (@i < 100000)

begin

       insert into Accounts values (@i, 'description', 'dummy-accounttype', @i*2)

       set @i = @i + 1

end

commit

Lets confirm , yes we have 100,000 rows here –

image

Now lets do a typical select , we should be performing quite well here with this query as we are seeking right on clustered index, so we will see a clustered index seek returning 68001 rows very fast, check stats below, note elapsed time is only 371 ms and CPU Time 47 ms:

Select [accountdescription], [accounttype] from Accounts where [accountkey] between 5000 and 73000

image

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

(68001 row(s) affected)
Table 'Accounts'. Scan count 1, logical reads 693, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 47 ms,  elapsed time = 371 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Ok that was pretty good. Now lets try same with clustered columnstore index. First I will truncate table and drop existing clustered index

USE [my_cs_test]
GO
TRUNCATE TABLE [Accounts]
GO
DROP INDEX Accounts.clx_accounts

Lets create clustered columnstore index here and that as I stated above will include all columns on this table:

USE [my_cs_test]

GO

CREATE CLUSTERED COLUMNSTORE INDEX [clx_Accounts] ON [dbo].[Accounts] WITH (DROP_EXISTING = OFF)

GO

Now using above routine I will load same 100,000 rows. Note this isn’t probably something you will do in data warehousing environment really, as you will probably use bcp or SSIS to load data into your warehouse, but for the purpose of our test it will suffice. Now note we are back to the same 100,000 rows:

image

And lets fire up now the same query. As you remember it was very fast and seek operation to begin with, so I really had doubts that we could do better. Nevertheless my logical reads and CPU time has fallen even further and even execution time got cut:

image

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

(68001 row(s) affected)
Table 'Accounts'. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 32 ms,  elapsed time = 177 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Obviously this isn’t a type of query where Columnstore index would make huge difference, but I used as tutorial to create and test clustered columnstore index. In data warehousing scenarios where large number of rows are being scanned by queries I noted huge difference using this in-memory columnar technology vs. old row based indexes. Oh yes, one more thing, clustered columnstore indexes are available as feature in Enterprise and Developer versions of SQL Server 2014.

For more information see – http://msdn.microsoft.com/en-us/library/gg492088.aspx, http://social.technet.microsoft.com/wiki/contents/articles/3540.sql-server-columnstore-index-faq.aspx, http://www.serioussqlserver.com/sql-2014/sql-2014-columnstore-indexes/, and http://blogs.askcts.com/2014/06/27/sql-server-2014-columnstore-indexes/