Recently I had to help some customers troubleshoot periodic performance degradation and timeouts in Windows AppFabric. Example errors these customers would see were:
ErrorCode<ERRCA0018>:SubStatus<ES0001>:The request timed out.
ErrorCode<ERRCA0017>:SubStatus<ES0006>:There is a temporary failure.
So in my earlier post I talked about troubleshooting and monitoring AppFabric Cache cluster. However I will add an obscure client configuration setting here as well that may be useful in resolving these timeouts.
Client to Server network contention: Quite well known possibility. Here netstat utility can be very useful. We can start with
netstat -a -n
Knowing that default port is 22233 you can use further switches like:
netstat -a -n | find "TCP 127.0.0.1:22233 " | find /C "TIME_WAIT"
That should give you all of the connections against host port 22233. If we get large numbers in TIME_WAIT state it means that there is a situation of: port\network contention. The client is trying to establish too many connection yet someone blocks the client from establishing them. To fix client to server connection contention you may modify client configuration, that obscure MaxConnectionsToServer parameter , its 2 by default.
<dataCacheClient requestTimeout=”15000″ channelOpenTimeout=”3000″ maxConnectionsToServer=”5″…>
if you are looking at a high throughput scenario, then increasing this value beyond 1 is recommended. Also, be aware that if you had 5 cache servers in the cluster, if the application uses 3 DataCacheFactories and if maxConnectionsToServer=3, from each client machine there would be 9 outbound TCP connections to each cacheserver, 45 in total across all cache servers. Based on that you may wish to look at increasing that value, but do so carefully as stated above that will increase number of TCP connections and therefore overhead as well. In general with singleton DataCacheFactory (as we recommend) I have seen pretty good results from modest increase.
Hope this helps.