WCF — Improve performance with greater concurrency

As good and inno­v­a­tive as WCF is, it also intro­duced a lot of new com­plex­i­ties and whilst it is easy to get some­thing up and run­ning quick­ly it takes much more under­stand­ing to make your ser­vice per­form as well as it could.

There are many things you need to con­sid­er such as bind­ing types, seri­al­iza­tion type, Datat­able or POCO, etc. etc. and any of these choic­es can have a telling effect on the over­all per­for­mance of your WCF ser­vice. Scott Wein­stein wrote a very good arti­cle on how to cre­ate a high per­for­mance WCF ser­vice (see ref­er­ence sec­tion) using 6 sim­ple steps.

With­out going into the sub­jects of bind­ing selec­tion and data nor­mal­iza­tion I want to just focus on how you can achieve greater con­cur­ren­cy because for ser­vices host­ed on the web you won’t be able to use NetTcp­Bind­ing and data nor­mal­iza­tion is almost irrel­e­vant because you won’t (or at least shouldn’t!) be send­ing large amounts of data back and forth.

Increase the throttling limits

Gen­er­al­ly speak­ing, the most com­mon way to improve the per­for­mance of a WCF ser­vice is to encour­age greater con­cur­ren­cy and if you have used WCF before then chances are you’ve had to change the default throt­tling behav­iour con­fig­u­ra­tion because the defaults are too low for any real world appli­ca­tions to be use­ful.

These defaults are set to ensure your ser­vice is safe from DOS attacks but unfor­tu­nate­ly also means your ser­vice will run in lock-down mode by default. They have since been raised to more sen­si­ble num­bers in the new WCF 4 release:

Max­Con­cur­rentSes­sions Max­Con­cur­rent­Calls Max­Con­cur­rentIn­stances
WCF 3.5 SP1 10 16 26
WCF 4 100 * Proces­sor Count 16 * Proces­sor Count 116 * Proces­sor Count

The new defaults in WCF 4 should pro­vide a good guide­line for you when con­fig­ur­ing the Ser­viceThrot­tling­Be­hav­ior of your ser­vice (assum­ing you’re not using WCF 4 already).

Use the PerCall or PerSession instance context mode

The Instance­Con­textMode also plays a sig­nif­i­cant role in the over­all per­for­mance of your ser­vice, and of the three options avail­able to you – Per­Call, PerS­es­sion and Sin­gle­ton – you should con­sid­er Per­Call or PerS­es­sion for a high­ly scal­able WCF ser­vice.

Whilst the Per­Call instance con­text mode is gen­er­al­ly regard­ed as the most scal­able option it does car­ry with it the need to cre­ate a instance of your class for each request and you need to ensure that 1) you have a para­me­ter­less con­struc­tor, and 2) this con­struc­tor should do as lit­tle as pos­si­ble. If there are any sig­nif­i­cant steps that need to be per­formed, such as load­ing some ref­er­ence data, you should avoid doing these in the para­me­ter­less con­struc­tor so they aren’t per­formed for each request:

[ServiceBehavior(InstanceContextMode = InstanceContextMode.PerCall)]
public class MyService : IMyService
{
    public MyService()
    {
        ReferenceData = LoadReferenceData(); // will be called for EACH request…
    }

    public MyReferenceData ReferenceData { get; private set; }
    …
}

[ServiceBehavior(InstanceContextMode = InstanceContextMode.PerCall)]
public class MyService2 : IMyService
{
    // called the first time the reference data is used and shared across all instances
    private static MyReferenceData ReferenceData = LoadReferenceData();

    public MyService2()
    {
        // ideal constructor which does nothing
    }

    …
}

In cas­es where the ini­tial­iza­tion steps are lengthy and unavoid­able, or your class require a num­ber of para­me­ters in the con­struc­tor (for instance, when you pro­gram­mat­i­cal­ly host a ser­vice retrieve from an IoC con­tain­er) the para­me­ter­less con­struc­tor can become a prob­lem. To get around this, you could cre­ate a wrap­per for your class and expose the wrap­per as the ser­vice instead but hold a sta­t­ic instance of the under­ly­ing ser­vice which all the requests are passed on to:

[ServiceBehavior(InstanceContextMode = InstanceContextMode.PerCall)]
public class MyServiceWrapper : IMyServiceWrapper
{
    // get the underlying service from the container
    private static IMyService MyService = _container.Resolve<IMyService>();

    public MyServiceWrapper()
    {
        // parameterless constructor which does nothing, so easy to constructor
    }
    public void DoSomething()
    {
        MyService.DoSomething();
    }
}

// dummy interface to ensure the wrapper has the same methods as the underlying service
// but helps to avoid confusion
public interface IMyServiceWrapper : IMyService
{
}

For a ses­sion­ful ser­vice, the PerS­es­sion instance con­text mode gives you all the ben­e­fit of the Per­Call instance con­text mode and at the same time reduces the over­head you pay for that extra con­cur­ren­cy because new instances of your class are no longer cre­at­ed for each request but for each ses­sion instead.

If your ser­vice is ses­sion depen­dant then you should def­i­nite­ly go with PerS­es­sion, but beware, if the chan­nel does not cre­ate a ses­sion your ser­vice will behave as if it was a Per­Call ser­vice.

Increase the number of idle IO threads in the thread pool

Per­haps the most over­looked aspect when it comes to increas­ing con­cur­ren­cy of a WCF ser­vice. If you’ve set your ser­vice to use Per­Call or PerS­es­sion instance con­text mode and upped the throt­tling set­tings, but are still not get­ting the response times you’re look­ing for, then it’s worth inves­ti­gat­ing whether the calls are being queued because there is not enough IO threads in the Thread­Pool to han­dle the requests.

You can estab­lish whether or not the requests are actu­al­ly tak­ing longer to process under load (as opposed to being queued at a ser­vice lev­el) either by pro­fil­ing local­ly or using some form of run-time log­ging (I wrote a LogEx­e­cu­tion­Time attribute which might come in handy). If the calls aren’t tak­ing longer to process and you’re not see­ing very high CPU util­i­sa­tion then the increase in response time is like­ly a result of the request being queued whilst WCF waits for a new IO thread to be made avail­able to han­dle the request.

WCF uses the IO threads from the Thread­Pool to han­dle requests and by default, the Thread­Pool keeps one IO thread around for each CPU. So on a sin­gle core machine that means you only have ONE avail­able IO thread to start with, and when more IO threads are need­ed they’re cre­at­ed by the Thread­Pool with a delay:

The thread pool main­tains a min­i­mum num­ber of idle threads. For work­er threads, the default val­ue of this min­i­mum is the num­ber of proces­sors. The Get­MinThreads method obtains the min­i­mum num­bers of idle work­er and I/O com­ple­tion threads.

When all thread pool threads have been assigned to tasks, the thread pool does not imme­di­ate­ly begin cre­at­ing new idle threads. To avoid unnec­es­sar­i­ly allo­cat­ing stack space for threads, it cre­ates new idle threads at inter­vals. The inter­val is cur­rent­ly half a sec­ond, although it could change in future ver­sions of the .NET Frame­work.

If an appli­ca­tion is sub­ject to bursts of activ­i­ty in which large num­bers of thread pool tasks are queued, use the Set­MinThreads method to increase the min­i­mum num­ber of idle threads. Oth­er­wise, the built-in delay in cre­at­ing new idle threads could cause a bot­tle­neck.”

How­ev­er, as Wen­Long Dong point­ed out in his blog (see ref­er­ences sec­tion), rais­ing the Min­IO­Threads set­ting in the Thread­Pool doesn’t work as you’d expect in .Net 3.5 because of a known issue with the Thread­Pool which has since been fixed in .Net 4. So if you’re still run­ning .Net 3.5 like most of us, then you will need to go and grab the hot­fix from here:

http://support.microsoft.com/kb/976898

Parting thoughts:

Per­for­mance tun­ing isn’t an exact sci­ence, and you have to make a case by case judge­ment on how best to approach your per­for­mance issues. Encour­ag­ing greater con­cur­ren­cy is just one of the ways you can improve per­for­mance, but it’s by no means a sil­ver bul­let! In fact, if you go too far down the con­cur­ren­cy route you could find your­self fac­ing a num­ber of prob­lems:

  • 100% CPU – exces­sive num­ber of con­cur­rent threads can max out your CPU and a lot of CPU time can be wast­ed on con­text switch­ing whilst your ser­vice becomes less respon­sive.
  • DOS attack – same as above, but intend­ed by a would be attack­er.
  • Out­OfMem­o­ryEx­cep­tion – if your ser­vice is return­ing a large set of data from database/committing large amount of data­base writes with­in a trans­ac­tion, it’s not unthink­able that you might run into the dread­ed Out­OfMem­o­ryEx­cep­tion giv­en that: 1) in prac­tice you only have a per process cap of around 1.2 ~ 1.5GB of mem­o­ry and each active thread is allo­cat­ed a 1MB of mem­o­ry space (regard­less of how much it actu­al­ly uses); 2) each con­cur­rent call per­forms large num­ber of object cre­ations which takes up avail­able mem­o­ry space until they’re garbage col­lect­ed 3) writ­ing to the Data­base with­in a trans­ac­tion adds to the trans­ac­tion log which also eats into the avail­able mem­o­ry.

References:

Wen­Long Dong’s post on WCF respons­es being slow and Set­MinThreads does not work

Wen­Long Dong’s post on WCF request throt­tling and Serv­er Scal­a­bil­i­ty

Wen­Long Dong’s post on WCF becom­ing slow after being idle for 15 sec­onds

Scott Weinstein’s post on cre­at­ing high per­for­mance WCF ser­vices

Dan Rigsby’s post on throt­tling WCF ser­vice and main­tain­ing scal­a­bil­i­ty