As good and inno­v­a­tive as WCF is, it also intro­duced a lot of new com­plex­i­ties and whilst it is easy to get some­thing up and run­ning quickly it takes much more under­stand­ing to make your ser­vice per­form as well as it could.

There are many things you need to con­sider such as bind­ing types, seri­al­iza­tion type, Datat­able or POCO, etc. etc. and any of these choices can have a telling effect on the over­all per­for­mance of your WCF ser­vice. Scott Wein­stein wrote a very good arti­cle on how to cre­ate a high per­for­mance WCF ser­vice (see ref­er­ence sec­tion) using 6 sim­ple steps.

With­out going into the sub­jects of bind­ing selec­tion and data nor­mal­iza­tion I want to just focus on how you can achieve greater con­cur­rency because for ser­vices hosted on the web you won’t be able to use NetTcp­Bind­ing and data nor­mal­iza­tion is almost irrel­e­vant because you won’t (or at least shouldn’t!) be send­ing large amounts of data back and forth.

Increase the throt­tling limits

Gen­er­ally speak­ing, the most com­mon way to improve the per­for­mance of a WCF ser­vice is to encour­age greater con­cur­rency and if you have used WCF before then chances are you’ve had to change the default throt­tling behav­iour con­fig­u­ra­tion because the defaults are too low for any real world appli­ca­tions to be useful.

These defaults are set to ensure your ser­vice is safe from DOS attacks but unfor­tu­nately also means your ser­vice will run in lock-down mode by default. They have since been raised to more sen­si­ble num­bers in the new WCF 4 release:

Max­Con­cur­rentSes­sions Max­Con­cur­rent­Calls Max­Con­cur­rentIn­stances
WCF 3.5 SP1 10 16 26
WCF 4 100 * Proces­sor Count 16 * Proces­sor Count 116 * Proces­sor Count

The new defaults in WCF 4 should pro­vide a good guide­line for you when con­fig­ur­ing the Ser­viceThrot­tling­Be­hav­ior of your ser­vice (assum­ing you’re not using WCF 4 already).

Use the Per­Call or PerS­es­sion instance con­text mode

The Instance­Con­textMode also plays a sig­nif­i­cant role in the over­all per­for­mance of your ser­vice, and of the three options avail­able to you – Per­Call, PerS­es­sion and Sin­gle­ton – you should con­sider Per­Call or PerS­es­sion for a highly scal­able WCF service.

Whilst the Per­Call instance con­text mode is gen­er­ally regarded as the most scal­able option it does carry with it the need to cre­ate a instance of your class for each request and you need to ensure that 1) you have a para­me­ter­less con­struc­tor, and 2) this con­struc­tor should do as lit­tle as pos­si­ble. If there are any sig­nif­i­cant steps that need to be per­formed, such as load­ing some ref­er­ence data, you should avoid doing these in the para­me­ter­less con­struc­tor so they aren’t per­formed for each request:

[ServiceBehavior(InstanceContextMode = InstanceContextMode.PerCall)]
public class MyService : IMyService
{
    public MyService()
    {
        ReferenceData = LoadReferenceData(); // will be called for EACH request…
    }

    public MyReferenceData ReferenceData { get; private set; }
    …
}

[ServiceBehavior(InstanceContextMode = InstanceContextMode.PerCall)]
public class MyService2 : IMyService
{
    // called the first time the reference data is used and shared across all instances
    private static MyReferenceData ReferenceData = LoadReferenceData();

    public MyService2()
    {
        // ideal constructor which does nothing
    }

    …
}

In cases where the ini­tial­iza­tion steps are lengthy and unavoid­able, or your class require a num­ber of para­me­ters in the con­struc­tor (for instance, when you pro­gram­mat­i­cally host a ser­vice retrieve from an IoC con­tainer) the para­me­ter­less con­struc­tor can become a prob­lem. To get around this, you could cre­ate a wrap­per for your class and expose the wrap­per as the ser­vice instead but hold a sta­tic instance of the under­ly­ing ser­vice which all the requests are passed on to:

[ServiceBehavior(InstanceContextMode = InstanceContextMode.PerCall)]
public class MyServiceWrapper : IMyServiceWrapper
{
    // get the underlying service from the container
    private static IMyService MyService = _container.Resolve<IMyService>();

    public MyServiceWrapper()
    {
        // parameterless constructor which does nothing, so easy to constructor
    }
    public void DoSomething()
    {
        MyService.DoSomething();
    }
}

// dummy interface to ensure the wrapper has the same methods as the underlying service
// but helps to avoid confusion
public interface IMyServiceWrapper : IMyService
{
}

For a ses­sion­ful ser­vice, the PerS­es­sion instance con­text mode gives you all the ben­e­fit of the Per­Call instance con­text mode and at the same time reduces the over­head you pay for that extra con­cur­rency because new instances of your class are no longer cre­ated for each request but for each ses­sion instead.

If your ser­vice is ses­sion depen­dant then you should def­i­nitely go with PerS­es­sion, but beware, if the chan­nel does not cre­ate a ses­sion your ser­vice will behave as if it was a Per­Call service.

Increase the num­ber of idle IO threads in the thread pool

Per­haps the most over­looked aspect when it comes to increas­ing con­cur­rency of a WCF ser­vice. If you’ve set your ser­vice to use Per­Call or PerS­es­sion instance con­text mode and upped the throt­tling set­tings, but are still not get­ting the response times you’re look­ing for, then it’s worth inves­ti­gat­ing whether the calls are being queued because there is not enough IO threads in the Thread­Pool to han­dle the requests.

You can estab­lish whether or not the requests are actu­ally tak­ing longer to process under load (as opposed to being queued at a ser­vice level) either by pro­fil­ing locally or using some form of run-time log­ging (I wrote a LogEx­e­cu­tion­Time attribute which might come in handy). If the calls aren’t tak­ing longer to process and you’re not see­ing very high CPU util­i­sa­tion then the increase in response time is likely a result of the request being queued whilst WCF waits for a new IO thread to be made avail­able to han­dle the request.

WCF uses the IO threads from the Thread­Pool to han­dle requests and by default, the Thread­Pool keeps one IO thread around for each CPU. So on a sin­gle core machine that means you only have ONE avail­able IO thread to start with, and when more IO threads are needed they’re cre­ated by the Thread­Pool with a delay:

The thread pool main­tains a min­i­mum num­ber of idle threads. For worker threads, the default value of this min­i­mum is the num­ber of proces­sors. The Get­MinThreads method obtains the min­i­mum num­bers of idle worker and I/O com­ple­tion threads.

When all thread pool threads have been assigned to tasks, the thread pool does not imme­di­ately begin cre­at­ing new idle threads. To avoid unnec­es­sar­ily allo­cat­ing stack space for threads, it cre­ates new idle threads at inter­vals. The inter­val is cur­rently half a sec­ond, although it could change in future ver­sions of the .NET Framework.

If an appli­ca­tion is sub­ject to bursts of activ­ity in which large num­bers of thread pool tasks are queued, use the Set­MinThreads method to increase the min­i­mum num­ber of idle threads. Oth­er­wise, the built-in delay in cre­at­ing new idle threads could cause a bottleneck.”

How­ever, as Wen­Long Dong pointed out in his blog (see ref­er­ences sec­tion), rais­ing the Min­IO­Threads set­ting in the Thread­Pool doesn’t work as you’d expect in .Net 3.5 because of a known issue with the Thread­Pool which has since been fixed in .Net 4. So if you’re still run­ning .Net 3.5 like most of us, then you will need to go and grab the hot­fix from here:

http://support.microsoft.com/kb/976898

Part­ing thoughts:

Per­for­mance tun­ing isn’t an exact sci­ence, and you have to make a case by case judge­ment on how best to approach your per­for­mance issues. Encour­ag­ing greater con­cur­rency is just one of the ways you can improve per­for­mance, but it’s by no means a sil­ver bul­let! In fact, if you go too far down the con­cur­rency route you could find your­self fac­ing a num­ber of problems:

  • 100% CPU – exces­sive num­ber of con­cur­rent threads can max out your CPU and a lot of CPU time can be wasted on con­text switch­ing whilst your ser­vice becomes less responsive.
  • DOS attack – same as above, but intended by a would be attacker.
  • Out­OfMem­o­ryEx­cep­tion – if your ser­vice is return­ing a large set of data from database/committing large amount of data­base writes within a trans­ac­tion, it’s not unthink­able that you might run into the dreaded Out­OfMem­o­ryEx­cep­tion given that: 1) in prac­tice you only have a per process cap of around 1.2 ~ 1.5GB of mem­ory and each active thread is allo­cated a 1MB of mem­ory space (regard­less of how much it actu­ally uses); 2) each con­cur­rent call per­forms large num­ber of object cre­ations which takes up avail­able mem­ory space until they’re garbage col­lected 3) writ­ing to the Data­base within a trans­ac­tion adds to the trans­ac­tion log which also eats into the avail­able memory.

Ref­er­ences:

Wen­Long Dong’s post on WCF responses being slow and Set­MinThreads does not work

Wen­Long Dong’s post on WCF request throt­tling and Server Scalability

Wen­Long Dong’s post on WCF becom­ing slow after being idle for 15 seconds

Scott Weinstein’s post on cre­at­ing high per­for­mance WCF services

Dan Rigsby’s post on throt­tling WCF ser­vice and main­tain­ing scalability

Share