In my previous post I described my favorite way of troubleshooting unmanaged heap corruption with AppVerifier, DebugDiag and Windows Debugger. However, in a couple of incidents with few customers in last few months that way of getting culprit simply didn’t work for one reason or another, usually due to some overhead from AppVerifier rules. In that case we needed to go and setup full debug heap option via Gflags. In this post I will quickly show you this classic old method.
First, let me remind you on what is heap corruption and why its so difficult to troubleshoot.
Heap corruption is an undesired change in the data allocated by your program. Its symptoms include:
- System errors, such as access violations.
- Unexpected data in program output.
- Unexpected paths of program execution.
Your program may show a symptom of heap corruption immediately or may delay it indefinitely, depending on the execution path through the program. Important to note again that crashing stack may be just a victim here, not code that actually corrupting stack, but code that simply “touched” the corrupted stack via a heap operation of some sort (allocation via malloc for example).
In general heap in Windows looks like this:
Heap is used for allocating and freeing objects dynamically for use by the program. Heap operations are called for when:
- The number and size of objects needed by the program are not known ahead of time.
- An object is too large to fit into a stack allocator.
Every process in Windows has one heap called the default heap. Processes can also have as many other dynamic heaps as they wish, simply by creating and destroying them on the fly. The system uses the default heap for all global and local memory management functions, and the C run-time library uses the default heap for supporting malloc functions. The heap memory functions, which indicate a specific heap by its handle, use dynamic heaps.
To debug heap corruption, you must identify both the code that allocated the memory involved and the code that deleted, released, or overwrote it. If the symptom appears immediately, you can often diagnose the problem by examining code near where the error occurred. Often, however, the symptom is delayed, sometimes for hours. In such cases, you must force a symptom to appear at a time and place where you can derive useful information from it. A common way to do this is for you to command the operating system to insert a special suffix pattern into a small segment of extra memory and check that pattern when the memory is deleted. Another way is for the operating system to allocate extra memory after each allocation and mark it as Protected, which would cause the system to generate an access violation when it was accessed.
So the first tool we will setup is Gflags. Gflags is a heap debug program. Using GFlags, you can establish standard, /full, or /dlls heap options that will force the operating system to generate access violations and corruption errors when your program overwrites heap memory. If you install Debugging Tools for Windows full package from Windows SDK that can be downloaded here – http://msdn.microsoft.com/en-us/windows/desktop/hh852363.aspx. We will use designated debugger option in Gflags when access violation is encountered. We will setup Windows Debugger (WinDBG) as designated debugger. To do that from command line you will do this:
GFlags /p /enable MyBadProgram.exe /full /debug WinDbg.exe
Substitute MyBadProgram.exe with your application and run it until error occurs.
The error might be an an Access Violation (most likely), a Memory Check Error, or any other error type severe enough to force the operation system to attach the debugger to the process and bring it up at a breakpoint. Then you can either analyze the error online as in any other debug session, or evaluate the error offline. To evaluate offline create a dump on the fly via:
.dump /ma MyDump.dmp
Click Stop Debugging in the WinDbg toolbar. This will halt the program and empty the debug window.
Turn off heap checking:
GFlags /p /disable MyBadProgram.exe
Now open resulting dump in Windows Debugger and analyze your dump. Here is an example, note I am scrubbing the stack to “protect the innocent”, changing real DLL name to BadDLL:
0:028> kpn # ChildEBP RetAddr 00 00a7de34 7763f659 ntdll!RtlReportCriticalFailure(long StatusCode = 0n-1073740940, void * FailureInfo = 0x77674270)+0x57 01 00a7de44 7763f739 ntdll!RtlpReportHeapFailure(long ErrorLevel = 0n2)+0x21 02 00a7de78 775ee045 ntdll!RtlpLogHeapFailure(_HEAP_FAILURE_TYPE FailureType = heap_failure_invalid_argument (0n9), void * HeapAddress = 0x003c0000, void * Address = 0x00000001, void * Param1 = 0x00000000, void * Param2 = 0x00000000, void * Param3 = 0x00000000)+0xa1 03 00a7dea8 76826e6a ntdll!RtlFreeHeap(void * HeapHandle = 0x003c0000, unsigned long Flags = 0, void * BaseAddress = 0x00000001)+0x64 04 00a7debc 76826f54 ole32!CRetailMalloc_Free(struct IMalloc * pThis = 0x769166bc, void * pv = 0x00000001)+0x1c 05 00a7decc 4470189f ole32!CoTaskMemFree(void * pv = 0x00000001)+0x13 06 00a7ded4 44405f9e BadDLL!ssmem_free+0xf 07 00a7def0 4441ade5 BadDLL+0x5f9e 08 00a7df0c 20e79c9d BadDLL!Init+0x255 09 00a7df84 20e638f6 BadDLL!DllUnregisterServer+0x39bd 0a 00a7dfe4 77639473 BadDLL+0x338f6
Hope this was helpful and until next time. For more on Gflags see – http://msdn.microsoft.com/en-us/library/windows/hardware/ff549557(v=vs.85).aspx and http://blogs.msdn.com/b/webdav_101/archive/2010/06/22/detecting-heap-corruption-using-gflags-and-dumps.aspx