WCF - Improve performance with greater concurrency

Yan Cui

I help clients go faster for less using serverless technologies.

As good and innovative as WCF is, it also introduced a lot of new complexities and whilst it is easy to get something up and running quickly it takes much more understanding to make your service perform as well as it could.

There are many things you need to consider such as binding types, serialization type, Datatable or POCO, etc. etc. and any of these choices can have a telling effect on the overall performance of your WCF service. Scott Weinstein wrote a very good article on how to create a high performance WCF service (see reference section) using 6 simple steps.

Without going into the subjects of binding selection and data normalization I want to just focus on how you can achieve greater concurrency because for services hosted on the web you won’t be able to use NetTcpBinding and data normalization is almost irrelevant because you won’t (or at least shouldn’t!) be sending large amounts of data back and forth.

Increase the throttling limits

Generally speaking, the most common way to improve the performance of a WCF service is to encourage greater concurrency and if you have used WCF before then chances are you’ve had to change the default throttling behaviour configuration because the defaults are too low for any real world applications to be useful.

These defaults are set to ensure your service is safe from DOS attacks but unfortunately also means your service will run in lock-down mode by default. They have since been raised to more sensible numbers in the new WCF 4 release:

	MaxConcurrentSessions	MaxConcurrentCalls	MaxConcurrentInstances
WCF 3.5 SP1	10	16	26
WCF 4	100 * Processor Count	16 * Processor Count	116 * Processor Count

The new defaults in WCF 4 should provide a good guideline for you when configuring the ServiceThrottlingBehavior of your service (assuming you’re not using WCF 4 already).

Use the PerCall or PerSession instance context mode

The InstanceContextMode also plays a significant role in the overall performance of your service, and of the three options available to you – PerCall, PerSession and Singleton – you should consider PerCall or PerSession for a highly scalable WCF service.

Whilst the PerCall instance context mode is generally regarded as the most scalable option it does carry with it the need to create a instance of your class for each request and you need to ensure that 1) you have a parameterless constructor, and 2) this constructor should do as little as possible. If there are any significant steps that need to be performed, such as loading some reference data, you should avoid doing these in the parameterless constructor so they aren’t performed for each request:

[ServiceBehavior(InstanceContextMode = InstanceContextMode.PerCall)]
public class MyService : IMyService
{
    public MyService()
    {
        ReferenceData = LoadReferenceData(); // will be called for EACH request…
    }

    public MyReferenceData ReferenceData { get; private set; }
    …
}

[ServiceBehavior(InstanceContextMode = InstanceContextMode.PerCall)]
public class MyService2 : IMyService
{
    // called the first time the reference data is used and shared across all instances
    private static MyReferenceData ReferenceData = LoadReferenceData();

    public MyService2()
    {
        // ideal constructor which does nothing
    }

    …
}

In cases where the initialization steps are lengthy and unavoidable, or your class require a number of parameters in the constructor (for instance, when you programmatically host a service retrieve from an IoC container) the parameterless constructor can become a problem. To get around this, you could create a wrapper for your class and expose the wrapper as the service instead but hold a static instance of the underlying service which all the requests are passed on to:

[ServiceBehavior(InstanceContextMode = InstanceContextMode.PerCall)]
public class MyServiceWrapper : IMyServiceWrapper
{
    // get the underlying service from the container
    private static IMyService MyService = _container.Resolve<IMyService>();

    public MyServiceWrapper()
    {
        // parameterless constructor which does nothing, so easy to constructor
    }
    public void DoSomething()
    {
        MyService.DoSomething();
    }
}

// dummy interface to ensure the wrapper has the same methods as the underlying service
// but helps to avoid confusion
public interface IMyServiceWrapper : IMyService
{
}

For a sessionful service, the PerSession instance context mode gives you all the benefit of the PerCall instance context mode and at the same time reduces the overhead you pay for that extra concurrency because new instances of your class are no longer created for each request but for each session instead.

If your service is session dependant then you should definitely go with PerSession, but beware, if the channel does not create a session your service will behave as if it was a PerCall service.

Increase the number of idle IO threads in the thread pool

Perhaps the most overlooked aspect when it comes to increasing concurrency of a WCF service. If you’ve set your service to use PerCall or PerSession instance context mode and upped the throttling settings, but are still not getting the response times you’re looking for, then it’s worth investigating whether the calls are being queued because there is not enough IO threads in the ThreadPool to handle the requests.

You can establish whether or not the requests are actually taking longer to process under load (as opposed to being queued at a service level) either by profiling locally or using some form of run-time logging (I wrote a LogExecutionTime attribute which might come in handy). If the calls aren’t taking longer to process and you’re not seeing very high CPU utilisation then the increase in response time is likely a result of the request being queued whilst WCF waits for a new IO thread to be made available to handle the request.

WCF uses the IO threads from the ThreadPool to handle requests and by default, the ThreadPool keeps one IO thread around for each CPU. So on a single core machine that means you only have ONE available IO thread to start with, and when more IO threads are needed they’re created by the ThreadPool with a delay:

“The thread pool maintains a minimum number of idle threads. For worker threads, the default value of this minimum is the number of processors. The GetMinThreads method obtains the minimum numbers of idle worker and I/O completion threads.

When all thread pool threads have been assigned to tasks, the thread pool does not immediately begin creating new idle threads. To avoid unnecessarily allocating stack space for threads, it creates new idle threads at intervals. The interval is currently half a second, although it could change in future versions of the .NET Framework.

If an application is subject to bursts of activity in which large numbers of thread pool tasks are queued, use the SetMinThreads method to increase the minimum number of idle threads. Otherwise, the built-in delay in creating new idle threads could cause a bottleneck.”

However, as WenLong Dong pointed out in his blog (see references section), raising the MinIOThreads setting in the ThreadPool doesn’t work as you’d expect in .Net 3.5 because of a known issue with the ThreadPool which has since been fixed in .Net 4. So if you’re still running .Net 3.5 like most of us, then you will need to go and grab the hotfix from here:

http://support.microsoft.com/kb/976898

Parting thoughts:

Performance tuning isn’t an exact science, and you have to make a case by case judgement on how best to approach your performance issues. Encouraging greater concurrency is just one of the ways you can improve performance, but it’s by no means a silver bullet! In fact, if you go too far down the concurrency route you could find yourself facing a number of problems:

100% CPU – excessive number of concurrent threads can max out your CPU and a lot of CPU time can be wasted on context switching whilst your service becomes less responsive.
DOS attack – same as above, but intended by a would be attacker.
OutOfMemoryException – if your service is returning a large set of data from database/committing large amount of database writes within a transaction, it’s not unthinkable that you might run into the dreaded OutOfMemoryException given that: 1) in practice you only have a per process cap of around 1.2 ~ 1.5GB of memory and each active thread is allocated a 1MB of memory space (regardless of how much it actually uses); 2) each concurrent call performs large number of object creations which takes up available memory space until they’re garbage collected 3) writing to the Database within a transaction adds to the transaction log which also eats into the available memory.

References:

WenLong Dong’s post on WCF responses being slow and SetMinThreads does not work

WenLong Dong’s post on WCF request throttling and Server Scalability

WenLong Dong’s post on WCF becoming slow after being idle for 15 seconds

Scott Weinstein’s post on creating high performance WCF services

Dan Rigsby’s post on throttling WCF service and maintaining scalability

Whenever you’re ready, here are 3 ways I can help you:

Production-Ready Serverless: Join 20+ AWS Heroes & Community Builders and 1000+ other students in levelling up your serverless game.
Consulting: If you want to improve feature velocity, reduce costs, and make your systems more scalable, secure, and resilient, then let’s work together and make it happen.
Join my FREE Community on Skool, where you can ask for help, share your success stories and hang out with me and other like-minded people without all the negativity from social media.

3 thoughts on “WCF – Improve performance with greater concurrency”

Pingback: WCF: why is the thread pool not creating additional IO threads - Question Lounge
Pingback: Training : Windows Communication Foundation | Stephen Haunts { Coding in the Trenches }
Peter
January 23, 2017 at 8:21 am

Isen’t ConcurrencyMode=ConcurrencyMode.Multiple + InstanceContextMode=InstanceContextMode.Single + UseSynchronizationContext=False
the most scalable?