Save Our Souls – Troubleshooting MSDTC

Although a lot of folks would rather forget about MSDTC existence this is one of those pesky, but vital Windows components that has been with us for a long time (since my days of heavy COM development) and will stay with us for longer time, although probably with minimal or no Microsoft investment.
MSDTC is an acronym for Microsoft Distributed Transaction Coordinator. As the name says, MSDTC is a Windows service providing transaction infrastructure for distributed systems. In this case, a transaction means a general way of structuring the interactions between autonomous agents in a distributed system. Each transaction is a state transformation with four key properties – the ACID properties: Atomic (all or nothing), Consistent (legal), Isolated (independent of concurrent transactions) and Durable (once it happens, it cannot be abrogated). There are different techniques that implement the ACID properties but the most known one is two-phase commit. So if you are like my customers using distributed transactions with SQL Server whether its from .NET, legacy code or Java most likely you are using MSDTC on Windows.  I don’t want to go deeper into DTC, but Denny Cherry has a little general snippet on it – http://itknowledgeexchange.techtarget.com/sql-server/what-exactly-is-msdtc-any-when-do-i-need-it/ , plus there is TechNet docs – https://technet.microsoft.com/en-us/library/dd337629(v=ws.10).aspx

For the most part DTC is a “black box” for DBAs and developers. But once in a while you get a situation where you start seeing issues or errors (most common are “Failure to Enlist in Transaction”) that require you to do a bit of troubleshooting.

First thing first. Basic idea should be to see if there is connectability and communication open from DTC between servers. Tool called DTCPing becomes vital here. You can download the tool from Microsoft here – http://www.microsoft.com/en-us/download/details.aspx?id=2868. Use following cadence to run the tool:

  • Download and launch tool on both servers involved in communication via DTC
  • As you launch the tool you will see following interface

image

  • From one of the Servers, in the Remote Server Name type the host name of the client. You can get the hostname by opening a command prompt and typing the command “hostname” without the  commas. Press the Ping button.
  • Then from the other machine, in the Remote Server Name give the hostname of the first server and press the Ping button.
  • This shows the reverse bind too between the machines. Also, you can analyze the text files present in the folder of DTCPing which will contain the results of the Ping.

Typical success in a log (dtcping.log) will look like:

Platform:Windows 2003
++++++++++++ipconfig info++++++++++

++++++++++++lmhosts.sam++++++++++++

++++++++++++hosts      ++++++++++++
127.0.0.1       localhost
199.199.3.2	MachineB
199.199.2.1	firewall

++++++++++++++++++++++++++++++++++++++++++++++
     DTCping 1.8 Report for MachineA
++++++++++++++++++++++++++++++++++++++++++++++
Firewall Port Settings:
	Port:5000-5004
	Port:4343
RPC server is ready
09-11, 14:10:40.187-->RPC server:MachineA received following information:
	Network Name: MachineA
	Source  Port: 5004
	Partner LOG: MachineB528.log
	Partner CID: C2FFD8D2-F9E4-40EF-AFAA-B9A11156A578
++++++++++++Start Reverse Bind Test+++++++++++++
Received Bind call from MachineB
	Network Name: MachineA
	Source  Port: 5004
	Hosting Machine:MachineA
09-11, 14:10:40.671-->Trying to Reverse Bind to MachineB...
	Test Guid:C2FFD8D2-F9E4-40EF-AFAA-B9A11156A578
Name Resolution:
	MachineB-->199.199.3.2-->MachineB
++++++++++++Validating Remote Computer Name++++++++++++
09-11, 14:10:44.937-->Start DTC connection test
Name Resolution:
	MachineB-->199.199.3.2-->MachineB
Reverse Binding success: TOMMYROT-->PACHYDERM
++++++++++++Reverse Bind Test ENDED++++++++++
09-11, 14:10:45.046-->Start RPC test (MachineA-->MachineB)
09-11, 14:10:45.453-->Called POKE from Partner:MachineB
	Network Name: MachineA
	Source  Port: 5004
	Hosting Machine:MachineA
RPC test is successful
	Partner's CID:C2FFD8D2-F9E4-40EF-AFAA-B9A11156A578
++++++++++++RPC test completed+++++++++++++++
++++++++++++Start DTC Binding Test +++++++++++++
Trying Bind to MachineB
09-11, 14:10:47.875-->MachineA Initiating DTC Binding Test....
	Test Guid:89D254FC-B695-4331-B1C1-907B803E491C
Received reverse bind call from MachineB
	Network Name: MachineA
	Source  Port: 5004
	Hosting Machine:MachineA
Binding success: MachineA-->MachineB
++++++++++++DTC Binding Test END+++++++++++++

Whereas typical failure:

Platform:Windows 2003
++++++IP Configure Information+++++

++++++++++++lmhosts.sam++++++++++++

++++++++++++hosts      ++++++++++++
127.0.0.1       localhost
199.199.3.2	MachineB
199.199.2.1	firewall

++++++++++++++++++++++++++++++++++++++++++++++
     DTCping 1.8 Report for MachineA
++++++++++++++++++++++++++++++++++++++++++++++
Firewall Port Settings:
	Port:5000-5004
	Port:4343
RPC server is ready
++++++++++++Validating Remote Computer Name++++++++++++
09-11, 16:54:58.890-->Start DTC connection test
Name Resolution:
	MachineB-->199.199.3.2-->MachineB
09-11, 16:54:58.890-->Start RPC test (MachineA-->MachineB)
09-11, 16:55:01.687-->RPC server:MachineA received following information:
	Network Name: MachineA
	Source  Port: 5004
	Partner LOG: MachineB520.log
	Partner CID: C2FFD8D2-F9E4-40EF-AFAA-B9A11156A578
++++++++++++Start Reverse Bind Test+++++++++++++
Received Bind call from MachineB
	Network Name: MachineA
	Source  Port: 5004
	Hosting Machine:MachineA
09-11, 16:55:02.000-->Trying to Reverse Bind to MachineB...
	Test Guid:C2FFD8D2-F9E4-40EF-AFAA-B9A11156A578
Name Resolution:
	MachineB-->199.199.3.2-->MachineB
Problem:fail to invoke remote RPC method
Error(0x6BA) at dtcping.cpp @289
-->RPC pinging exception
-->1722(The RPC server is unavailable.)
RPC test failed

There is a very detailed blog from MS Distributed Systems Support on DTCPing log exception analysis with some of very common failures – http://blogs.msdn.com/b/distributedservices/archive/2008/11/12/troubleshooting-msdtc-issues-with-the-dtcping-tool.aspx

So if DTCPing tool is all about connectivity, ports and firewalls opened what are some of the other ways to trace DTC transactions? Well a fairly obscure and hidden DTC Transaction Tracing can come to the rescue. The transaction tracing becomes a powerful tool when the transactions are either taking too long to run or they are getting aborted intermittently and nothing inside your application is giving you any hint around why the transaction is getting aborted. By default, only the long-lived transactions are traced. You can turn ON tracing for all transactions or aborted transactions based on the kind of issue you are troubleshooting. You just have to select the relevant checkboxes in the trace settings and tracing will take effect (no restart of MSDTC service is required to toggle these settings)  The next important thing becomes collecting the trace in the right manner. It is always better to collect a trace which just contains information about the problem that you are troubleshooting because on a busy server the Transaction trace can contain a lot of transactions and it might become very difficult to find out which transaction outcome you are interested in

Steps to enable DTC tracing:

  • Start –> run –>dcomcnfg
  • Expand My Computer –> Distributed Transaction Coordinator –> Local DTC –>  Right Click Properties and the go to the tracing tab.
  • By Default, the Trace all transactions won’t be enabled while Trace Aborted Transactions and Trace Long Lived Transactions will be enabled as shown below

  • To collect the traces correctly, reproduce the issue under new session, therefore  we need to first click on Stop Session Button and then the New Session Button as shown below before reproducing the issue:

  • This tracing log will be found under C:\Windows\System32\MSDTC\Trace Folder. Each log can be max of 25 MB and when it exceeds it will create a new log.

You can also trace via registry as explained here – https://support.microsoft.com/en-us/kb/926099

At this point navigate to C:\Windows\System32\MSDTC\Trace folder and we will need to parse that trace file  using MSDTCVTR batch file in that folder:

The MSDTC trace can contain a lot of other events as well and I am pasting here some of the most common events that we see while parsing trace files from real time customer scenarios.

ABORT_DUE_TO_TRANSACTION_TIMER_EXPIRED
The Transaction is timing out because of the configured timeout for a transaction in DTC settings.

RECEIVED_ABORT_REQUEST_FROM_BEGINNER
The application which started the transaction issued an abort request to MSDTC to ABORT the transaction. In case of a transaction originating from a COM+ component, a call to ContextUtil.SetAbort will end up sending this event to MSDTC.

TRANSACTION_PROPOGATION_FAILED_CONNECTION_DOWN_FROM_REMOTE_TM
This error is typically due to networking issues. This happens when MSDTC on one machine loses connection with MSDTC on the other machine. For more information on this event, refer this article.

TRANSACTION_PROPAGATION_FAILED_TRANSACTION_NOT_FOUND
This event happens when you try to commit or abort a transaction that has been already committed or aborted.

RM_ENLIST_FAILED_TOO_MANY_ENLISTS
DTC already has the maximum number of resources enlisted in the transaction. The maximum number of resources that can be enlisted in a transaction is 32

Hope this helps. For more on MSDTC Troubleshooting and setup see – http://www.sqlwebpedia.com/content/msdtc-troubleshooting, https://support.microsoft.com/en-us/kb/926099, https://msdn.microsoft.com/en-us/library/ms678891(v=vs.85).aspx, https://msdn.microsoft.com/en-us/library/ms678917(v=vs.85).aspx

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s