Threading – thread-safe size capped queue

Here is a queue class based on the implementation Marc Gravell provided in this StackOverflow question:

/// <summary>
/// A thread-safe fixed sized queue implementation
/// See: http://stackoverflow.com/questions/530211/creating-a-blocking-queuet-in-net/530228#530228
/// </summary>
public sealed class SizeQueue<T>
{
    private readonly Queue<T> _queue = new Queue<T>();
    private readonly int _maxSize;
    private readonly object _syncRoot = new object();
    public SizeQueue(int maxSize)
    {
        _maxSize = maxSize;
    }
    public int Count
    {
        get
        {
            lock (_syncRoot)
            {
                return _queue.Count;
            }
        }
    }
    public object SyncRoot
    {
        get
        {
            return _syncRoot;
        }
    }
    /// <summary>
    /// Puts an item onto the queue
    /// </summary>
    public void Enqueue(T item)
    {
        lock (_syncRoot)
        {
            // don't enqueue new item if the max size has been met
            while (_queue.Count >= _maxSize)
            {
                Monitor.Wait(_syncRoot);
            }
            _queue.Enqueue(item);
            // wake up any blocked dequeue
            Monitor.PulseAll(_syncRoot);
        }
    }
    /// <summary>
    /// Returns the first item from the queue
    /// </summary>
    public T Dequeue()
    {
        return Dequeue(1).FirstOrDefault();
    }
    /// <summary>
    /// Returns the requested number of items from the head of the queue
    /// </summary>
    public IEnumerable<T> Dequeue(int count)
    {
        lock (_syncRoot)
        {
            // wait until there're items on the queue
            while (_queue.Count == 0)
            {
                Monitor.Wait(_syncRoot);
            }
            
            // read as many items off the queue as required (and possible)
            var items = new List<T>();
            while (count > 0 && _queue.Count > 0)
            {
                items.Add(_queue.Dequeue());
                count--;
            }
           return items;
        }
    }
}

Threading – introducing SmartThreadPool

As I’ve mentioned in my previous post, the biggest problem with using the .Net ThreadPool is that there’s a limited number of threads in the pool and you’ll be sharing with other .Net framework classes so you need to be on your best behaviour and not hog the thread pool with long running or blocking jobs.

But what if you need to do lots of concurrent IO jobs which blocks and want your main thread to wait till all these threads are done? Normally in these cases you will need to create your own threads, start them and then join all the spawned threads with your main thread.

This approach would of course carry with it the overhead of creating and destroying threads, and if you’re doing the same thing in lots of different places simultaneously it can also push your CPU to 100% too. For example, you’ve got multiple threads running, and each spawns many more threads to do their concurrent IO jobs at the same time.

In situations like this, you almost want to have a thread pool for these jobs which is separate from the .Net ThreadPool, that way you avoid the overheads of using your own threads and can curb the CPU usage because you’re not creating new threads unnecessarily. But to create your own implementation that’s anywhere near as good as the .Net ThreadPool is no small undertaking, which is why I was so glad when I found out about SmartThreadPool.

Here are some of the features which I found really useful:

SmartThreadPool objects are instantiable

Which means you can create different thread pools for different type of jobs, each with an appropriate number of threads. This way each type of jobs have its own dedicated pool of threads and won’t eat into each other’s quota (and that of the .Net framework classes!).

Work items can have a return value, and exceptions are passed back to the caller

Getting return values from thread pool threads has always been a pain, as is catching any exceptions that are thrown on those threads, and with the SmartThreadPool you can now do both!

// create new SmartThreadPool
var threadPool = new SmartThreadPool();
// queue up work items
var result = threadPool.QueueWorkItem(
    new Amib.Threading.Func<object, bool>(o => true), new object());
var exception = threadPool.QueueWorkItem(
    new Amib.Threading.Func<object, bool>(o => { throw new Exception(); }), new object());

// wait till the items are done
if (SmartThreadPool.WaitAll(new[] { result, exception }))
{
    Console.WriteLine(result.Result); // prints true
    try
    {
        Console.WriteLine(exception.Result); // throws exception
    }
    catch (Exception)
    {
        Console.WriteLine("Exception");
    }
}

Work items can have priority

The SmartThreadPool allows you to specify the priority of the threads in the pool, so it’s possible to have a thread pool for critical jobs with higher priority and a separate thread pool for non-essential jobs which have a lower priority:

// create a STPStartInfo object and change the default values
var stpStartInfo = new STPStartInfo
    {
        DisposeOfStateObjects = true,
        MinWorkerThreads = 0,
        MaxWorkerThreads = 10,
        ThreadPriority = ThreadPriority.Lowest
    };
// create the SmartThreadPool instance
var threadPool = new SmartThreadPool(stpStartInfo);

References:

SmartThreadPool’s CodePlex homepage

MSDN article on managing the thread pool

Threading – using the ThreadPool vs. creating your own threads

There are a lot of discussions on the pros and cons of using the ThreadPool and creating your own threads. Having spent a bit of time reading what others have to say, here’s a summary of the things I’ve picked up on.

The problem with creating your own threads

Creating and destroying threads has a high CPU usage, so when you need to perform lots of small, simple tasks concurrently the overhead of creating your own threads can take up a significant portion of the CPU cycles and severely affect the final response time. This is especially true in stress conditions where executing multiple threads can push CPU to 100% and most of the time would be wasted in context switching (swapping threads in and out of the processor along with their memory).

Using the Thread Pool

This is where the .Net Thread Pool comes in, where a number of threads are created ahead of time and kept around to pick up any work items you give them to do, without the overhead associated with creating your own threads.

When not to use the Thread Pool

In an ideal world you would always want to use the Thread Pool, but there are some real-world limitations. Most importantly, and the reason why most experts would tell you not to use the Thread Pool except for brief jobs is that: there is a limited number of threads in the .Net Thread Pool (250 per CPU by default), and they are being used by many of the .Net framework classes (e.g. timer events are fired on thread pool threads) so you wouldn’t want your application to hog the thread pool.

There are also a number of situations where you shouldn’t use the thread pool:

  • You require a foreground thread, all the thread pool threads are background threads
  • You require a thread to have a particular priority.
  • You have tasks that cause the thread to block for long periods of time. The thread pool has a maximum number of threads, so a large number of blocked thread pool threads might prevent tasks from starting.
  • You need to place threads into a single-threaded apartment. All ThreadPool threads are in the multithreaded apartment.
  • You need to have a stable identity associated with the thread, or to dedicate a thread to a task.

Exceptions in Thread Pool threads

Unhandled exceptions on thread pool threads terminate the process with 3 exceptions:

  • A ThreadAbortException is thrown in a thread pool thread, because Abort was called.
  • An AppDomainUnloadedException is thrown in a thread pool thread, because the application domain is being unloaded.
  • The common language runtime (CLR) or a host process terminates the thread.

When to create your own threads

As I’ve mentioned already, creating your own threads is bad when lots of simple tasks require a relative large overhead in context switching, and the Thread Pool is bad for long running, or blocking tasks. Which leads to the natural conclusion :-P – create your own threads for long running, or blocking tasks!

Parting thoughts…

When working with the Thread Pool there are some useful methods at your disposable, including:

  • GetAvailableThreads method which returns the number of threads available to you
  • GetMinThreads method returns the number of idle threads the thread pool maintains in anticipation of new requests
  • GetMaxThreads method returns the max number of thread pool threads that can be active concurrently
  • SetMinThreads method sets the number of idle threads the thread pool maintains in anticipation of new requests
  • SetMaxThreads method sets the number of thread pool threads that can be active concurrently

If you’re interested in how the ThreadPool class dynamically manages the size of the thread pool under the hood (despite giving you the option to set min and max threads) you should have a read of Pedram Razai’s blog post in the reference section.

And before you go, I mentioned earlier that all Thread Pool threads are background threads, so how do they differ from foreground threads? Well, foreground and Background threads are identical with one exception: a background thread does not keep the managed execution environment running. Once all foreground threads have been stopped in a managed process (where the .exe file is a managed assembly), the system stops all background threads and shuts down.

References:

StackOverflow question on Thread Pool vs Thread Spawning

Another StackOverflow question on Thread Pool vs Thread Spawning

StackOverflow question on when to use the Thread Pool in C#

StackOverflow question on managing the size of the Thread Pool in C#

StackOverflow question with detail on the throttling behaviour of the ThreadPool

MSDN article on The Managed Thread Pool

MSDN C# Programming Guide : how to use a ThreadPool

MSDN article on why we need a thread pool

Jon Skeet’s introductory article on Multi-threading in C#

Pedram Rezaei’s blog post on dedicated threads vs threadpool threads

Smart Thread Pool project on CodePlex

Threading – Producer-Consumer Pattern

Having run into a bit of deadlocking issue while working on Bingo.Net I spent a bit of time reading into the Producer-Consumer pattern and here’s what I learnt which I hope you’ll find useful too.

AutoResetEvent and ManualResetEvent

To start off, MSDN has an introductory article on how to synchronize a producer and a consumer thread. It’s a decent starting point, however, as some of the comments pointed out the sample code is buggy and allows for a race condition to happen when the AutoResetEvent is reset in quick succession whilst the consumer thread is processing the previous reset.

The problem with the AutoResetEvent is that you can set an event that is already set which does not have any effect:

image

And just like that, the Consumer has missed one of the items! However, you can work around this easily enough by locking onto the queue’s sync root object when it wakes up and dequeue as many item as there are on the queue. This might not always be applicable to your requirement though.

A much better choice would be to use the Monitor class instead.

Using Monitor.Pulse and Monitor.PulseAll

As Jon Skeet pointed out in his Threading article (see reference) the Auto/ManualResetEvent and Mutex classes are significantly slower than Monitor.

Using the Monitor.Wait, Monitor.Pulse and Monitor.PulseAll methods, you can allow multiple threads to communicate with each other similar to the way the reset events work but only safer. I have found quite a number of different implementations of the Producer-Consumer pattern using these two methods, see the references section for details.

Whilst the implementations might differ and depending on the situation sometimes you might want multi-producer/consumer support and other times you might want to ensure there can only be one producer/consumer. Whatever your requirement might be, the general idea is that both producer(s) and consumer(s) have to acquire a lock on the same sync object before they can add or remove items from the queue. And depending on who’s holding the lock:

  • producer(s) would acquire a lock, add item to the queue, and then pulse, which gives up the lock and wait up other waiting threads (the consumers)
  • consumer(s) would acquire a lock, start listening for items in a continuous loop and wait for a pulse, which gives up the lock (allowing other producers/consumers to get in and acquire the lock). When the consumer is woken up it reacquires the lock and can then safely process new items from the queue.

There are a number of other considerations you should take into account, such as the exit conditions of the consumer’s continuous loop, etc. Have a look at the different implementations I have included in the reference section to get a feel of what you need to do to implement the Producer-Consumer pattern to suit your needs.

Parting thoughts..

Another thing which you should be aware of when implementing any sort of Producer-Consumer model is the need to make the queue thread-safe seeing as multiple threads can be reading/writing to it concurrently. As you might know already, there is a Queue.Synchronized method which you can use to get a synchronized (in other words, thread-safe) version of the queue object.

But as discussed on the BCL Team Blog (see reference), the synchronized wrapper pattern gave developers a false sense of security and led some Microsoft developers to write unsafe code like this:

if (syncQueue.Count > 0) {
    Object obj = null;
    try {
        // count could be changed by this point, hence invalidating the if check
        bj = syncQueue.Dequeue();
    }
    catch (Exception) { } // this swallows any indication we have a race condition
    // use obj here, dealing with null.
}

Which is why they decided it’s better to force developers to use explicit lock statement around the whole operation.

Another way to think about the problem is to picture the Queue.Synchronized method as the equivalent of a volatile modifier for a reference type which guarantees the latest value of the instance (as opposed to the reference pointer only). Which means it’s save to use in atomic operations (a single instruction) but does not stop interleaving instructions from multiple threads between operations.

References:

MSDN article on how to synchronize a producer and a consumer thread

Jon Skeet’s article on WaitHandles

Jon Skeet’s article on Deadlocks and an implementation of the Producer-Consumer pattern using Monitor

Mike Kempf’s implementation of the Producer-Consumer pattern using Monitor

BCL Team Blog post on why there is not Queue<T>.Synchronized

StackOverflow question on why there’s no Queue<T>.Synchronized

Threading – when to use Monitor.Pulse and Monitor.PulseAll

Ever wondered when you should use Monitor.Pulse and when you should use Monitor.PulseAll given that only one thread will be able to acquire the lock even if you wake up multiple threads? I did, and that’s when I stumbled across a similar question on StackOverflow and as ever, Jon Skeet came up with a really good analogy for when either should be used:

Imagine you’ve got a single printer. Only one person can use it at a time, so if you’re got a lot of people waiting, you send them all to sleep – but you only wake one person up when the printer becomes free. This mirrors the use of Pulse.

Now imagine you run a shop. While you’re closed, customers wait outside the shop. When you open the shop, you don’t just want to wake up one customer – they can all come in now. This mirrors the use of PulseAll.

References:

StackOverflow question on the difference between Pulse and PulseAll