Running To Stand Still – Adventures in .NET Parallelism

Last winter I was fortunate enough to be involved in performance tuning engagement on one of customer sites . While testing and tuning customers Web ASP.NET based application we noted that running certain functions\tasks synchronously surprisingly was actually somewhat faster and more stable than asyncronizing these tasks via Task Parallel Library (TPL).

We saw no CPU contention or extensive context switching. CPU averages 25-35% and maxes around 55-60%. Moreover async change does not cause noticeably higher CPU , we see 2 to 4% increase on average only. The low CPU doesn’t surprise me here since the threads if I understand correctly are simply blocked  while waiting for network call operation to complete) for a while. So no high CPU, no thread switching, no easy to see contention – but yet asynchronous methods that should be running faster are not. Moreover we observed a pretty bad “jitteriness” in application response times with significant latency spikes.

Looking at ANTS Profiler picture is easy to understand (time is spent waiting on async tasks to complete), but really doesn’t explain that question as to why: time is spent in ThreadPool here:

image

Grabbing couple of hang dumps and looking into these actually started explaining things and uncovered a rather interesting picture.  I will scrub stacks to “protect an innocent ” here, but you will get an idea nevertheless:

Looking for managed thread pool contention you will not notice any:

0:032> !threadpool
CPU utilization: 17%
Worker Thread: Total: 22 Running: 5 Idle: 17 MaxLimit: 32767 MinLimit: 4
Work Request in Queue: 0
--------------------------------------
Number of Timers: 2
--------------------------------------
Completion Port Thread:Total: 9 Free: 8 MaxFree: 8 CurrentLimit: 9 MaxLimit: 1000 MinLimit: 4

But in addition to clearly showing no thread contention here, i.e. I am not running out of worker threads, something else becomes really apparent.As we can see, there are 17 idle threads with 5 doing some work. The thread pool removes an idle thread after it’s been idle for 10 seconds so this means the number of threads we are using is bursty. The way that Tasks are being used isn’t really the best way to use them. There are two reasons for using tasks.

  • First, if you have a CPU intensive operation you can put that work on another thread so that you can either return (e.g. in a GUI app to keep UI responsive) or do something else on the current thread.
  • Second, as a way to execute an asynchronous operation such as network or disk io and easily queue up the work that needs to be done after the asynchronous operation has completed.

In addition looking at the stacks in the dump majority are in following stacks:

OS Thread Id: 0x40 (109)
Child SP               IP Call Site
000000bbdf8bd9f8 000007fd958b315b [GCFrame: 000000bbdf8bd9f8] 
000000bbdf8bdac8 000007fd958b315b [HelperMethodFrame_1OBJ: 000000bbdf8bdac8] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object)
000000bbdf8bdbe0 000007fd8a8f0d6c System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken) 
000000bbdf8bdca0 000007fd8a8f078b System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken) 
000000bbdf8bdd30 000007fd8b1bb9da System.Threading.Tasks.Task.InternalWait(Int32, System.Threading.CancellationToken) 
000000bbdf8bde30 000007fd8b1c14f6 System.Threading.Tasks.Task`1[[System.__Canon, mscorlib]].GetResultCore(Boolean)
000000bbdf8bde70 000007fd2db17606 CustomerApp.SubmitRequest[[System.__Canon, mscorlib]](System.__Canon, System.Net.Http.HttpClient, System.Uri, System.Net.Http.Formatting.MediaTypeFormatter)

Looking at dump stacks here we are synchronously waiting on a task to complete which is occupying a thread. We have off loaded the network call to a Task which is running on another thread but aren’t gaining the benefits of doing so as you are just waiting for it to come back. Because of the large (compared with running) number of idle threads, the suspicion is that our off box calls are made using synchronous APIs which would mean another thread blocking doing nothing.

The jitter in the latency times is most likely caused by resource contention with needing to have a new thread created for all these tasks we are dispatching which don’t have a thread to work on. The Worker Thread MinLimit is 4, and as our number of threads is greater than that, the ThreadPool will only give us a new thread once every 500ms. Once you’ve created enough threads, the next request should run quicker, until you leave the threads idle long enough that they are cleaned up.

The basic operation manner of the thread pool is actually pretty easy to explain. The thread pool starts from 0 threads, it immediately creates new threads to serve work requests until the number of running threads reaches a configurable minimum (the default minimum is set to the number of CPUs on the machine).

While the number of running threads is equal or bigger than that minimum – the thread pool will create new threads at the rate of 1 thread per 0.5 second. Which means that if your application is running on a dual core machine, and 16 work requests that spans for 10 seconds are scheduled together, assuming the the thread pool is empty, the first two request will be served immediately, the third after 0.5 second, the forth after 1 second and the 16th after 7 seconds. In addition, to avoid unnecessary starvation a demons thread is running in the background and periodically monitors the CPU – in case of low CPU utilization it creates new threads as appropriate.

The thread pool will not create new threads after it reaches a configurable maximum. The default maximum is set to 250 * number of CPUs, which is 1o times more than it was in the 1.0 version of the .NET Framework. The default maximum was increased in order to reduce the chance for possible dead lock that can occur when callbacks rely on each other in order to complete.

After a thread finish its work it is not being destroyed immediately, rather it stays in the pool waiting for another work to arrive. Once new work arrive it is being served immediately by one of the waiting threads. The waiting threads are being destroyed only after spending 10 seconds (was 40 seconds) on the pool doing nothing.

clip_image002

So what is the lesson here? If you are using TPL make sure that you are using TPL and asynchronous code through your complete call. If you are calling Redis for example – call Redis asynchronous client, etc. Otherwise sometimes you may get more consistent and stable results just being synchronous in your applications that are using ASP.NET thread pool.

Hope this helps.

For more see – http://blogs.msdn.com/b/tmarq/archive/2010/04/14/performing-asynchronous-work-or-tasks-in-asp-net-applications.aspx, http://blog.stephencleary.com/2013/11/taskrun-etiquette-examples-dont-use.html,

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s