Although a lot of folks would rather forget about MSDTC existence this is one of those pesky, but vital Windows components that has been with us for a long time (since my days of heavy COM development) and will stay with us for longer time, although probably with minimal or no Microsoft investment.
MSDTC is an acronym for Microsoft Distributed Transaction Coordinator. As the name says, MSDTC is a Windows service providing transaction infrastructure for distributed systems. In this case, a transaction means a general way of structuring the interactions between autonomous agents in a distributed system. Each transaction is a state transformation with four key properties – the ACID properties: Atomic (all or nothing), Consistent (legal), Isolated (independent of concurrent transactions) and Durable (once it happens, it cannot be abrogated). There are different techniques that implement the ACID properties but the most known one is two-phase commit. So if you are like my customers using distributed transactions with SQL Server whether its from .NET, legacy code or Java most likely you are using MSDTC on Windows. I don’t want to go deeper into DTC, but Denny Cherry has a little general snippet on it – http://itknowledgeexchange.techtarget.com/sql-server/what-exactly-is-msdtc-any-when-do-i-need-it/ , plus there is TechNet docs – https://technet.microsoft.com/en-us/library/dd337629(v=ws.10).aspx
For the most part DTC is a “black box” for DBAs and developers. But once in a while you get a situation where you start seeing issues or errors (most common are “Failure to Enlist in Transaction”) that require you to do a bit of troubleshooting.
First thing first. Basic idea should be to see if there is connectability and communication open from DTC between servers. Tool called DTCPing becomes vital here. You can download the tool from Microsoft here – http://www.microsoft.com/en-us/download/details.aspx?id=2868. Use following cadence to run the tool:
- Download and launch tool on both servers involved in communication via DTC
- As you launch the tool you will see following interface
- From one of the Servers, in the Remote Server Name type the host name of the client. You can get the hostname by opening a command prompt and typing the command “hostname” without the commas. Press the Ping button.
- Then from the other machine, in the Remote Server Name give the hostname of the first server and press the Ping button.
- This shows the reverse bind too between the machines. Also, you can analyze the text files present in the folder of DTCPing which will contain the results of the Ping.
Typical success in a log (dtcping.log) will look like:
Platform:Windows 2003 ++++++++++++ipconfig info++++++++++ ++++++++++++lmhosts.sam++++++++++++ ++++++++++++hosts ++++++++++++ 127.0.0.1 localhost 18.104.22.168 MachineB 22.214.171.124 firewall ++++++++++++++++++++++++++++++++++++++++++++++ DTCping 1.8 Report for MachineA ++++++++++++++++++++++++++++++++++++++++++++++ Firewall Port Settings: Port:5000-5004 Port:4343 RPC server is ready 09-11, 14:10:40.187-->RPC server:MachineA received following information: Network Name: MachineA Source Port: 5004 Partner LOG: MachineB528.log Partner CID: C2FFD8D2-F9E4-40EF-AFAA-B9A11156A578 ++++++++++++Start Reverse Bind Test+++++++++++++ Received Bind call from MachineB Network Name: MachineA Source Port: 5004 Hosting Machine:MachineA 09-11, 14:10:40.671-->Trying to Reverse Bind to MachineB... Test Guid:C2FFD8D2-F9E4-40EF-AFAA-B9A11156A578 Name Resolution: MachineB-->126.96.36.199-->MachineB ++++++++++++Validating Remote Computer Name++++++++++++ 09-11, 14:10:44.937-->Start DTC connection test Name Resolution: MachineB-->188.8.131.52-->MachineB Reverse Binding success: TOMMYROT-->PACHYDERM ++++++++++++Reverse Bind Test ENDED++++++++++ 09-11, 14:10:45.046-->Start RPC test (MachineA-->MachineB) 09-11, 14:10:45.453-->Called POKE from Partner:MachineB Network Name: MachineA Source Port: 5004 Hosting Machine:MachineA RPC test is successful Partner's CID:C2FFD8D2-F9E4-40EF-AFAA-B9A11156A578 ++++++++++++RPC test completed+++++++++++++++ ++++++++++++Start DTC Binding Test +++++++++++++ Trying Bind to MachineB 09-11, 14:10:47.875-->MachineA Initiating DTC Binding Test.... Test Guid:89D254FC-B695-4331-B1C1-907B803E491C Received reverse bind call from MachineB Network Name: MachineA Source Port: 5004 Hosting Machine:MachineA Binding success: MachineA-->MachineB ++++++++++++DTC Binding Test END+++++++++++++
Whereas typical failure:
Platform:Windows 2003 ++++++IP Configure Information+++++ ++++++++++++lmhosts.sam++++++++++++ ++++++++++++hosts ++++++++++++ 127.0.0.1 localhost 184.108.40.206 MachineB 220.127.116.11 firewall ++++++++++++++++++++++++++++++++++++++++++++++ DTCping 1.8 Report for MachineA ++++++++++++++++++++++++++++++++++++++++++++++ Firewall Port Settings: Port:5000-5004 Port:4343 RPC server is ready ++++++++++++Validating Remote Computer Name++++++++++++ 09-11, 16:54:58.890-->Start DTC connection test Name Resolution: MachineB-->18.104.22.168-->MachineB 09-11, 16:54:58.890-->Start RPC test (MachineA-->MachineB) 09-11, 16:55:01.687-->RPC server:MachineA received following information: Network Name: MachineA Source Port: 5004 Partner LOG: MachineB520.log Partner CID: C2FFD8D2-F9E4-40EF-AFAA-B9A11156A578 ++++++++++++Start Reverse Bind Test+++++++++++++ Received Bind call from MachineB Network Name: MachineA Source Port: 5004 Hosting Machine:MachineA 09-11, 16:55:02.000-->Trying to Reverse Bind to MachineB... Test Guid:C2FFD8D2-F9E4-40EF-AFAA-B9A11156A578 Name Resolution: MachineB-->22.214.171.124-->MachineB Problem:fail to invoke remote RPC method Error(0x6BA) at dtcping.cpp @289 -->RPC pinging exception -->1722(The RPC server is unavailable.) RPC test failed
There is a very detailed blog from MS Distributed Systems Support on DTCPing log exception analysis with some of very common failures – http://blogs.msdn.com/b/distributedservices/archive/2008/11/12/troubleshooting-msdtc-issues-with-the-dtcping-tool.aspx
So if DTCPing tool is all about connectivity, ports and firewalls opened what are some of the other ways to trace DTC transactions? Well a fairly obscure and hidden DTC Transaction Tracing can come to the rescue. The transaction tracing becomes a powerful tool when the transactions are either taking too long to run or they are getting aborted intermittently and nothing inside your application is giving you any hint around why the transaction is getting aborted. By default, only the long-lived transactions are traced. You can turn ON tracing for all transactions or aborted transactions based on the kind of issue you are troubleshooting. You just have to select the relevant checkboxes in the trace settings and tracing will take effect (no restart of MSDTC service is required to toggle these settings) The next important thing becomes collecting the trace in the right manner. It is always better to collect a trace which just contains information about the problem that you are troubleshooting because on a busy server the Transaction trace can contain a lot of transactions and it might become very difficult to find out which transaction outcome you are interested in
Steps to enable DTC tracing:
- Start –> run –>dcomcnfg
- Expand My Computer –> Distributed Transaction Coordinator –> Local DTC –> Right Click Properties and the go to the tracing tab.
- By Default, the Trace all transactions won’t be enabled while Trace Aborted Transactions and Trace Long Lived Transactions will be enabled as shown below
- To collect the traces correctly, reproduce the issue under new session, therefore we need to first click on Stop Session Button and then the New Session Button as shown below before reproducing the issue:
- This tracing log will be found under C:\Windows\System32\MSDTC\Trace Folder. Each log can be max of 25 MB and when it exceeds it will create a new log.
You can also trace via registry as explained here – https://support.microsoft.com/en-us/kb/926099
At this point navigate to C:\Windows\System32\MSDTC\Trace folder and we will need to parse that trace file using MSDTCVTR batch file in that folder:
The MSDTC trace can contain a lot of other events as well and I am pasting here some of the most common events that we see while parsing trace files from real time customer scenarios.
The Transaction is timing out because of the configured timeout for a transaction in DTC settings.
The application which started the transaction issued an abort request to MSDTC to ABORT the transaction. In case of a transaction originating from a COM+ component, a call to ContextUtil.SetAbort will end up sending this event to MSDTC.
This error is typically due to networking issues. This happens when MSDTC on one machine loses connection with MSDTC on the other machine. For more information on this event, refer this article.
This event happens when you try to commit or abort a transaction that has been already committed or aborted.
DTC already has the maximum number of resources enlisted in the transaction. The maximum number of resources that can be enlisted in a transaction is 32
Hope this helps. For more on MSDTC Troubleshooting and setup see – http://www.sqlwebpedia.com/content/msdtc-troubleshooting, https://support.microsoft.com/en-us/kb/926099, https://msdn.microsoft.com/en-us/library/ms678891(v=vs.85).aspx, https://msdn.microsoft.com/en-us/library/ms678917(v=vs.85).aspx