Trouble in Distributed Cache Land –Windows AppFabric Cache Timeouts

Recently I had to help some customers troubleshoot periodic performance degradation and timeouts in Windows AppFabric. Example errors these customers would see were:

ErrorCode<ERRCA0018>:SubStatus<ES0001>:The request timed out.

ErrorCode<ERRCA0017>:SubStatus<ES0006>:There is a temporary failure.

So in my earlier post I talked about troubleshooting and monitoring AppFabric Cache cluster.  However I will add an obscure client configuration setting here as well that may be useful in resolving these timeouts.

config

Client to Server network contention:  Quite well known possibility. Here netstat utility can be very useful. We can start with

netstat -a -n

Knowing that default port is 22233 you can use further switches like:

netstat -a -n | find "TCP 127.0.0.1:22233 " | find /C "TIME_WAIT"

That should give you all of the connections against host port 22233. If we get large numbers in TIME_WAIT state  it means that there is a situation of: port\network contention. The client is trying to establish too many connection yet someone blocks the client from establishing them. To fix client to server connection contention you may modify client configuration, that obscure MaxConnectionsToServer parameter , its 2 by default.

<dataCacheClient requestTimeout=”15000″ channelOpenTimeout=”3000″ maxConnectionsToServer=”5″…>

if you are looking at a high throughput scenario, then increasing this value beyond 1 is recommended. Also, be aware that if you had 5 cache servers in the cluster, if the application uses 3 DataCacheFactories and if maxConnectionsToServer=3, from each client machine there would be 9 outbound TCP connections to each cacheserver, 45 in total across all cache servers. Based on that you may wish to look at increasing that value, but do so carefully as stated above that will increase number of TCP connections and therefore overhead as well. In general with singleton DataCacheFactory (as we recommend) I have seen pretty good results from modest increase.

Hope this helps.

Advertisements

One thought on “Trouble in Distributed Cache Land –Windows AppFabric Cache Timeouts

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s