Alalia Voce In Mari Confabulationes– Tackling Controversy of SQL Server 5 Minute In-Memory Rule

When I thought about blogging on this topic I knew that I would be opening a rather controversial issue. Through many years dealing with customer issues and with guidance on many Microsoft SQL Server based products I had to defend old 5 minute in-memory rule and for good reasons. First, lets understand what is 5 minute in-memory rule, why is it controversial with some folks, although it definitely shouldn’t be in my opinion, and why it still matters.

The 5 minute rule was first described long ago in a cult paper, which I believe should be part of every computer science degree (at least it always was in Berkeley, MIT and NYU\Polytechnic) written by Jim Gray and Franco Putzolu long ago – in 1985. The paper is available from Berkeley site at – http://db.cs.berkeley.edu/cs286/papers/fiveminute-tr1986.pdf. Essentially what it states is to  cache randomly accessed disk pages that are re-used every 5 minutes or less.The original 5-minute rule was derived from the following cost-benefit computation:

BreakEvenIntervalinSeconds = (PagesPerMBofRAM / AccessesPerSecondPerDisk) × (PricePerDiskDrive / PricePerMBofRAM)

What does it mean in practice for SQL Server? Well, SQL Server always contained great basic performance counter metric on the instance level as part of Buffer Manager object called Page Life Expectancy (commonly shortened to PLE in SQL Server community).

image

Like all database servers, Microsoft SQL Server thrives on memory. Each instance of Microsoft SQL Server has its own memory address space. The SQL Server address space (Virtual memory allocation by Windows) consists of two main components: executable code and memory pool.

In our case we are mainly interested in memory pool and its largest component – Buffer Cache.

buffer2

The memory pool is the unit of memory that an instance of SQL Server uses to create and manage data structures relating to client requests. Buffer Cache is the pool of memory pages into which data pages are read to satisfy requests. One of the primary design goals of all database software is to minimize disk I/O because disk reads and writes are among the most resource-intensive operations. SQL Server builds a buffer pool in memory to hold pages read from the database. As it always a lot more performing to serve page from Buffer Cache than from Disk here where 5 minute rule described above and PLE metrics become paramount for SQL Server performance. The minimum recommended value for PLE is just that – 300 seconds (5 minutes). In simple words, if your page stays longer in the buffer pool (area of the memory cache) your PLE is higher, leading to higher performance as every time request comes there are chances it may find its data in the cache itself instead of going to hard drive to read the data. You can find your PLE value also from SQL Server DMV query in addition to Performance Monitor:

SELECT [object_name],
[counter_name],
[cntr_value]
FROM sys.dm_os_performance_counters
WHERE [object_name] LIKE '%Manager%'
AND [counter_name] = 'Page life expectancy'

As you can see on my legacy SQL Server 2008 instance result is pretty high as a have plenty of memory dedicated to SQL Server Buffer via max_server_memory and not much traffic at all:

image

So why the controversy? Well, since the 5 minute rule is now close to 30 years old there are some revisions that have been published to this classic.

In September 1997, group from MS Research, including Jim Gray published “The Five-Minute Rule Ten Years Later,and Other Computer Storage Rules of Thumb” technical report – http://db.cs.berkeley.edu/cs286/papers/fiveminute-tr1997.pdf  Actually this paper reinforced 5 minute rule stating – “These calculations indicate that the reference interval of Equation (1) is almost unchanged, despite these 10x, 100x, and 1,000x changes. It is still inthe 1-minute to 10-minute range. The 5-minute rule still applies to randomly accessed pages.”

However, as time went by various customers reading various blogs on how PLE is old started questioning classic >300 PLE rule. Statements also from various customers included following – “

“We were seeing SCOM alerts below 300 but our environment was running well after verifying…we noticed that we only saw a problem with disk after the PLE went below 100 and specifically around the 80 mark.” .  Customers also mentioned well respected Pinal Dave blog, which I love to read as well – http://blog.sqlauthority.com/2014/05/01/sql-server-good-value-for-page-life-expectancy-notes-from-the-field-026/.  Issue is no one really read his blog carefully where he states: “

A very old and dated recommendation is that this value be 300 or higher which is only 5 minutes.  With the standard servers in place today, this value is too low….”

I don’t disagree that he is right,  since todays servers have much more memory than previously dedicated to the buffer and you theoretically should see a lot higher PLE. But 5 minute rule is an absolute critical minimum still with one possible exception. The exception is Flash Memory. Flash memory is an electronic non-volatile computer storage medium that can be electrically erased and reprogrammed.  I worked with arrays from some of the leading flash vendors including Violin Memory (http://pull.vmem.com/wp-content/uploads/Violin-Solution-Brief-Windows-Flash-Array-SQL.pdf) and Fusion-IO (http://www.fusionio.com/press-releases/fusion-io-unlocks-performance-for-sql-server-2014) and can attest to speed and throughput increases using Flash. Finally, HP Labs whitepaper titled – “The five-minute rule twenty years later,and how flash memory changes the rules” , that can be downloaded at http://db.cs.berkeley.edu/cs286/papers/fiveminute-damon2007.pdf shows that although old 5 minute rule is still generally valid  , with ascension of Flash it can be used to fill the gap between traditional RAM memory and traditional disk drives in many operating systems, file systems, and database systems.

So controversies aside, the old 5 minute rule is still here. Now, I would recommend augment your monitoring of PLE with a couple of related metrics for more accuracy, which include BufferManager: Buffer Page Hit Ratio and BufferManager: Free Pages to get a better and all round picture of your buffer memory utilization. For more see here – https://www.sqlskills.com/blogs/jonathan/the-accidental-dba-day-21-of-30-essential-perfmon-counters/

Advertisements

One thought on “Alalia Voce In Mari Confabulationes– Tackling Controversy of SQL Server 5 Minute In-Memory Rule

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s