Check out my new course Learn you some Lambda best practice for great good! and learn the best practices for performance, cost, security, resilience, observability and scalability.
Over the last couple of years, there have been many discussions/debates on DataSet vs Collections, and there was a very good article in MSDN magazine on just that:
To add to the Dark Sides of DataSet, there is a little known feature/bug/annoyance in the DataTable.Select() method – every time you call the Select() method it creates a new index implicitly without you having any control over it, and the index is not cleared until you call DataTable.AcceptChanges().
If your application has to deal with a large amount of data and have to use the Select() method repeatedly without calling AcceptChanges() then you might have a problem! Why? Consider these two factors:
1. the bigger the DataTable, the bigger the index, and if the index object is bigger than 85kb it gets allocated to the Large Object Heap which are not cleared automatically by the Garbage Collector/takes much longer to clear than small objects
2. in a 32-bit windows system, there’s a 2GB Virtual Address Space limit for each process, and in practice, you will usually get an OutOfMemoryException when your process has used around 1.2GB – 1.5GB of RAM
combine them and it’s not hard to imagine a scenario where your process might actually run out of memory and crash out before it completes its task! (Believe me, it was a hard learned lesson from my personal experience!)
1. unless you actually need some of the features DataSet offers such as the ability to keep multiple versions of the same row (Original, Current, etc.) you might be better off with using POCO (plain old CLR object) instead which are simple, lightweight and you can use LINQ to Objects with i4o to get some impressive performance improvements. After I implemented this change, my application went from crashing out with OutOfMemoryException to maxing out at 70MB throughout its lifetime and finished in about 15% of the time it’d have taken using DataSet.
2. if getting rid of DataSet altogether takes a little too much time and effort than you can afford, then there’s a quick workaround by using a DataView and dynamically change the Filter string every time you intend to call the Select() method.
If you wish to learn more about Garbage Collection in general, you should read Maoni’s WebLog which covers all things CLR Garbage Collector! He also wrote a nice article focused on Large Object Heap back in June 2008 which is well worth a read:
I specialise in rapidly transitioning teams to serverless and building production-ready services on AWS.
Are you struggling with serverless or need guidance on best practices? Do you want someone to review your architecture and help you avoid costly mistakes down the line? Whatever the case, I’m here to help.
Check out my new podcast Real-World Serverless where I talk with engineers who are building amazing things with serverless technologies and discuss the real-world use cases and challenges they face. If you’re interested in what people are actually doing with serverless and what it’s really like to be working with serverless day-to-day, then this is the podcast for you.
Check out my new course, Learn you some Lambda best practice for great good! In this course, you will learn best practices for working with AWS Lambda in terms of performance, cost, security, scalability, resilience and observability. We will also cover latest features from re:Invent 2019 such as Provisioned Concurrency and Lambda Destinations. Enrol now and start learning!
Check out my video course, Complete Guide to AWS Step Functions. In this course, we’ll cover everything you need to know to use AWS Step Functions service effectively. There is something for everyone from beginners to more advanced users looking for design patterns and best practices. Enrol now and start learning!
Are you working with Serverless and looking for expert training to level-up your skills? Or are you looking for a solid foundation to start from? Look no further, register for my Production-Ready Serverless workshop to learn how to build production-grade Serverless applications!
Here is a complete list of all my posts on serverless and AWS Lambda. In the meantime, here are a few of my most popular blog posts.
- Lambda optimization tip – enable HTTP keep-alive
- You are wrong about serverless and vendor lock-in
- You are thinking about serverless costs all wrong
- Just how expensive is the full AWS SDK?
- Many faced threats to Serverless security
- We can do better than percentile latencies
- Yubl’s road to Serverless
- AWS Lambda – should you have few monolithic functions or many single-purposed functions?
- AWS Lambda – compare coldstart time with different languages, memory and code sizes
- Guys, we’re doing pagination wrong
- Top 10 Serverless framework best practices