Unmanaged memory leaks in legacy code are notoriously hard to troubleshoot. Majority of developers unfortunately become aware of leaks only when application throws a notorious OOM (Out of Memory) exception, not during development or testing. Tracking for leaks requires relatively specialized testing, including running long running “soak” tests and tracking memory footprint over the course of hours and sometimes even days.
So, unfortunately most memory leaks are found not during development or testing, but rather in production. At that time situation quickly becomes critical and you have to find answers to following in production:
- Which objects are leaking memory?
- Why these objects are leaking, perhaps there is a static reference or they are simply are not freed?
Its somewhat easier in managed code, such as .NET or Java. In .NET for example you have options to do following:
- Use memory leak DebugDiag rule and taking dumps at certain intervals use extensions such as SOS or Tom Christian’s PSSCOR with WinDBG to analyze memory footprint over time, including looking at roots, gchandles , finalization queue, etc.
- Use Profiler tools such as free CLRProfiler, or SciTek Memory Profiler, or RedGate ANTS to profile memory utilization. May be a bit heavy for production, but possible.
- Use PerfView utility based on ETW (Event Tracing for Windows) as lightweight memory profiler
It’s a lot different for unmanaged\native code in Windows. There are few methods available, my favorite was always use DebugDiag memory leak rule with LeakTrak dll injected into the process that would track allocations and allocation stacks. However, I love to state that sometimes you learn new methods and here is one I learned this week. I was so excited that I decided to blog and share this method ASAP.
The first thing we have to do is inform the heap service of Windows that we want to track down allocations for a specific process. Once again, it’s the magic tool GFlags that we have to use that is a part of Debugging Tools for Windows. I previously shown how GFlags can be used to troubleshoot the worst unmanaged heap issue of them all – heap corruption. If you start it up and navigate to Image File tab, where you will enter your leaking application name\path (ex. c:\Program Files\mybadapp\mybadapp.exe) into Image textbox. Next check Create User Mode Stack Trace Database checkbox.
By checking the “Create user mode stack trace database”, you notify the Windows heap service that it has to record the call stack for each allocation done in the heap- http://msdn.microsoft.com/en-us/library/windows/hardware/ff540107(v=vs.85).aspx.
Another way to turn on these settings would be through command line:
gflags /i <application> +ust
This command should have the output:
Current Registry Settings for MyLeakingCPP.exe executable are: 00000000
To verify that gflags.exe was used correctly, you can create dump of the process , open dump in WinDBG and do following
0:000> !gflag Current NtGlobalFlag contents: 0x00001040 hpc - Enable heap parameter checking ust - Create user mode stack trace database
So now with gflags set, next step is learning about another tool that ships with Debugging Tools for Windows – UDMH. UDMH can take a snapshot of the allocation data at a specific time, and also compare two snapshots. So idea here is start process , take a snapshot, repro that leak, while watching private bytes for the process in Windows Performance Monitor, when process grown quite a bit take another snapshot. Finally compare these two snapshots.
So, once you set up gflags, start process “clean” . Once you want to take a snapshot, start a command line. Make sure that environment variable _NT_SYMBOL_PATH is set to following:
- srv*<some local cache folder>*http://msdl.microsoft.com/Download/Symbols , if your company doesn’t have its own symbol server
- srv*<some local cache folder>*<your symbol server path>;srv*<some local cache folder>*http://msdl.microsoft.com/Download/Symbols, if your company has its own symbol server or share
Whether you define this environment variable at system level, or only in your command line, make sure it is set with the right information. For more on symbol path see – http://msdn.microsoft.com/en-us/library/windows/hardware/ff558829(v=vs.85).aspx
Now lets take that snapshot. In command line:
C:\Debugging Tools for Windows>umdh –p: -f:MySnapshot0.txt
Now reproduce the issue as much as you can, grow those private bytes and take next snapshot:
C:\Debugging Tools for Windows>umdh –p: -f:MySnapshot1.txt
Now lets compare both, again in command line:
C:\Debugging Tools for Windows>umdh –d MySnapshot0.txt MySnapshot1.txt -f:MyResult.txt
Now the contents of MyResult.txt will contain all memory leaks plus a stack trace which reflects the location where memory was allocated, but never subsequently freed.
Finally if the symbols are resolved correctly you should see something like deltas with allocation stacks below:
+ 5760144 ( 5760144 – 0) 26 allocs BackTrace1178AC + 16 ( 6 – 0) BackTrace1178AC allocations ntdll!RtlAllocateHeap+00000274 MSVCR100D!_heap_alloc_base+00000053 MSVCR100D!_heap_alloc_dbg_impl+000001FC MSVCR100D!_nh_malloc_dbg_impl+0000001F MSVCR100D!_nh_malloc_dbg+0000002C MSVCR100D!malloc+0000001B MSVCR100D!operator new+00000011 MyLeakingCPP new+0000000E MyLeakingCPP!wmain+000000A7 MyLeakingCPP!__tmainCRTStartup+000001BF MyLeakingCPP!wmainCRTStartup+0000000F kernel32!BaseThreadInitThunk+0000000E ntdll!__RtlUserThreadStart+00000070 ntdll!_RtlUserThreadStart+0000001B
For more see – http://support.microsoft.com/kb/268343, http://msdn.microsoft.com/en-us/library/windows/hardware/ff560206(v=vs.85).aspx, http://nvharikrishna.wordpress.com/2012/07/11/umdh-a-simple-tool-for-memory-leak-detection-in-windows/ and https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=LeakDetectionInUserMode