About half a year ago I was contacted by customer who has implemented SQL Server 2014 In Memory OLTP on VMware VM platform and was seeing some problematic behavior under load in terms of performance. Customer was seeing high CPU condition on SQL Server that cannot be completely explained by load patterns, where CPU to jump from an average of 15-20% to 90-100% . After a bit of performance log analysis for In-Memory OLTP and other counters issue was explained. Here is what, why and how:
In-Memory OLTP tables come in two flavors – durable and non-durable.
- Durable are default type. The schema of the durable table is stored in the system database in primary file group, just like the regular tables. The data are stored in the MEMORY OPTIMIZED DATA file group. Creating a memory optimized durable table is simple. You can use the regular table creation script with MEMORY_OPTIMIZED=ON table option to create a durable table. Once the SQL Server 2014 is restarted, the data in the durable table is loaded to the memory. the online transaction processing is done on the data loaded to the memory. So important to note that durable tables provide for data durability between restarts.
- Non-durable tables. Just like the durable table, the schema of the non-durable table is stored in the system database in the physical disc. But, contrary to the durable table, Non-Durable table’s data are stored in the memory alone. So during a server crash or restart, all the data stored in the non-durable table will be lost. Thus the non-durable table acts like a permanent table with temporary data.
Customer was working with durable tables. What really caused an issue here is extremely slow disk IO subsystem configuration on VMware platform hosting this SQL Server instance, after all why care for disk performance when you are In-Memory? Not so fast.
Enter Data Merge in Durable In-Memory OLTP. The data for memory optimized tables is stored in one or more data/delta file pairs (also referred to as checkpoint file pairs or CFP) with data file(s) storing inserted rows and delta file(s) referencing deleted rows. During the execution of an OLTP workload, as the DML operations update, insert, and delete rows, new data/delta files are created to persist the data. Also, existing delta files are updated to process delete of existing rows. Over time, if the number of active rows, after accounting for deleted rows, in two or more consecutive CFPs falls below a threshold (usually < 50%) such that they can be merged into one CFP of 128 MB, they are merged automatically by a background merge process into a new CFP. Once the merge operation is complete, the older CFPs go through a transition phase and are eventually removed (i.e. garbage collected) from the storage. SQL Server 2014 supports up to 8192 CFPs within a database.
The Merge operation takes one or more closed CFPs, called MERGE SOURCE(s), based on an internally defined merge policy, described later, and produces one resultant CFP, called MERGE TARGET. The entries in each delta file of source CFP(s) are used to filter rows from the corresponding data file to remove the data rows that are not needed anymore. The remaining rows in the source CFPs are consolidated into one target CFP. After the merge is complete, the resultant CFP (i.e. the merge-target) replaces the source CFPs (i.e. the merge sources). The merge-source CFPs go through a transition phase before they are eventually removed from the storage.
After accounting for the storage taken by CFPs in various states, the overall storage taken by durable memory-optimized tables can be much larger than 2x. The DMV sys.dm_db_xtp_checkpoint_files can be queried to list all the CFPs in memory-optimized file group including the phase they are in. Transitioning CFPs from MERGE_SOURCE state to TOMBSTONE and ultimately getting garbage collected can take up to 5 checkpoints and transaction log backup, if database is configured for full or bulk-logged recovery model, steps. You can, of course, manually force the checkpoint followed by log backup to expedite the garbage collection but then this will add 5 empty CFPs (i.e. 5 data/delta file pairs with data file of size 128MB each).
Sure thing looking at performance log – XTP:Storage\Merge Requests Outstanding start appearing right during the spike and trending upwards after the spike:
So I started looking at data and correlating KPIs further, following can be seen:
Merge Requests Outstanding counter tracks number of merge requests outstanding since the server started. Note that previous to CPU spike these were at 0 and then we started climbing. That points that perhaps our Disk IO subsystem doesn’t keep up with Hekaton MERGE activities on the disk, mainly perhaps writes to destination MERGE file.
There are also performance considerations to manage when implementing durable memory-optimized tables. The placement of data files is important for::
- Offline checkpoint performance.
- Performing the merge for the checkpoint file containers.
- Recovery in the case of service failure. This will load the containers, via streaming I/O, which loads the files into memory and recreates the table.
In the steady state case of checkpoint writes, merge, and file clean-up, there will be I/O for:
- Checkpoint writes.
- Reads from source files participating in merge operations.
- Writes to the destination merge file.
To prove or disprove whether you are seeing disk bottleneck on Merge you can recreate condition with manual Merge as explained here – http://blogs.technet.com/b/dataplatforminsider/archive/2014/01/22/merge-operation-in-memory-optimized-tables.aspx
For more information regarding the data and delta files and checkpoints, see “Durability for Memory-Optimized Tables” (http://msdn.microsoft.com/en-us/library/dn553125(v=sql.120).aspx) and “Checkpoint Operations for Memory-Optimized Tables” (http://msdn.microsoft.com/en-us/library/dn553124(v=sql.120).aspx).