Vade indigena–Troubleshooting Native Memory Leaks with GFLAGS AND UDMH

Unmanaged memory leaks in legacy code are notoriously hard to troubleshoot. Majority of developers unfortunately become aware of leaks only when application throws a notorious OOM (Out of Memory) exception, not during development or testing. Tracking for leaks requires relatively specialized testing, including running long running “soak” tests and tracking memory footprint over the course of hours and sometimes even days.

So, unfortunately most memory leaks are found not during development or testing, but rather in production. At that time situation quickly becomes critical and you have to find answers to following in production:

  • Which objects are leaking memory?
  • Why these objects are leaking, perhaps there is a static reference or they are simply are not freed?

Its somewhat easier in managed code, such as .NET or Java. In .NET for example you have options to do following:

  • Use memory leak DebugDiag rule and taking dumps at certain intervals use extensions such as SOS or Tom Christian’s PSSCOR with WinDBG to analyze memory footprint over time, including looking at roots, gchandles , finalization queue, etc.
  • Use Profiler tools such as free CLRProfiler, or SciTek Memory Profiler, or RedGate ANTS to profile memory utilization. May be a bit heavy for production, but possible.
  • Use PerfView utility based on ETW (Event Tracing for Windows) as lightweight memory profiler

It’s a lot different for unmanaged\native code in Windows. There are few methods available, my favorite was always use DebugDiag memory leak rule with LeakTrak dll injected into the process that would track allocations and allocation stacks. However, I love to state that sometimes you learn new methods and here is one I learned this week. I was so excited that I decided to blog and share this method ASAP.

The first thing we have to do is inform the heap service of Windows that we want to track down allocations for a specific process. Once again, it’s the magic tool GFlags that we have to use that is a part of Debugging Tools for Windows. I previously shown how GFlags can be used to troubleshoot the worst unmanaged heap issue of them all – heap corruption. If you start it up and navigate to Image File tab, where you will enter your leaking application name\path (ex. c:\Program Files\mybadapp\mybadapp.exe) into Image textbox. Next check Create User Mode Stack Trace Database checkbox.

image

By checking the “Create user mode stack trace database”, you notify the Windows  heap service that it has to record the call stack for each allocation done in the heap- http://msdn.microsoft.com/en-us/library/windows/hardware/ff540107(v=vs.85).aspx.

Another way to turn on these settings would be through command line:

gflags /i <application> +ust

This command should have the output:

Current Registry Settings for MyLeakingCPP.exe executable are: 00000000

To verify that gflags.exe was used correctly, you can create dump of the process , open dump in WinDBG and do following

0:000> !gflag
 Current NtGlobalFlag contents: 0x00001040
 hpc - Enable heap parameter checking
 ust - Create user mode stack trace database

So now with gflags set, next step is learning about another tool that ships with Debugging Tools for Windows – UDMH. UDMH can take a snapshot of the allocation data at a specific time, and also compare two snapshots. So idea here is start process , take a snapshot, repro that leak, while watching private bytes for the process in Windows Performance Monitor, when process grown quite a bit take another snapshot. Finally compare these two snapshots.

So, once you set up gflags, start process “clean” . Once you want to take a snapshot, start a command line. Make sure that environment variable _NT_SYMBOL_PATH is set to following:

  • srv*<some local cache folder>*http://msdl.microsoft.com/Download/Symbols , if your company doesn’t have its own symbol server
  • srv*<some local cache folder>*<your symbol server path>;srv*<some local cache folder>*http://msdl.microsoft.com/Download/Symbols, if your company has its own symbol server or share

Whether you define this environment variable at system level, or only in your command line, make sure it is set with the right information. For more on symbol path see – http://msdn.microsoft.com/en-us/library/windows/hardware/ff558829(v=vs.85).aspx

Now lets take that snapshot. In command line:

C:\Debugging Tools for Windows>umdh –p: -f:MySnapshot0.txt

Now reproduce the issue as much as you can, grow those private bytes and take next snapshot:

C:\Debugging Tools for Windows>umdh –p: -f:MySnapshot1.txt

Now lets compare both, again in command line:

C:\Debugging Tools for Windows>umdh –d MySnapshot0.txt MySnapshot1.txt -f:MyResult.txt

Now the contents of MyResult.txt will contain all memory leaks plus a stack trace which reflects the location where memory was allocated, but never subsequently freed.

Finally if the symbols are resolved correctly you should see something like deltas with allocation stacks below:

+ 5760144 ( 5760144 –      0)      26 allocs    BackTrace1178AC
 +       16 (      6 –      0)    BackTrace1178AC    allocations

ntdll!RtlAllocateHeap+00000274
 MSVCR100D!_heap_alloc_base+00000053 
 MSVCR100D!_heap_alloc_dbg_impl+000001FC 
 MSVCR100D!_nh_malloc_dbg_impl+0000001F 
 MSVCR100D!_nh_malloc_dbg+0000002C 
 MSVCR100D!malloc+0000001B 
 MSVCR100D!operator new+00000011 
 MyLeakingCPP new[]+0000000E 
MyLeakingCPP!wmain+000000A7 
 MyLeakingCPP!__tmainCRTStartup+000001BF 
 MyLeakingCPP!wmainCRTStartup+0000000F 
 kernel32!BaseThreadInitThunk+0000000E
 ntdll!__RtlUserThreadStart+00000070
 ntdll!_RtlUserThreadStart+0000001B

For more see – http://support.microsoft.com/kb/268343, http://msdn.microsoft.com/en-us/library/windows/hardware/ff560206(v=vs.85).aspx, http://nvharikrishna.wordpress.com/2012/07/11/umdh-a-simple-tool-for-memory-leak-detection-in-windows/ and https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=LeakDetectionInUserMode

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s