The Stack, The Heap And The Memory Pitfalls

In the last cou­ple of days or so I have spent some time read­ing Karl Seguin’s excel­lent and FREE to down­load ebook — Foun­da­tions of Pro­gram­ming which cov­ers many top­ics from depen­den­cy injec­tion to best prac­tices for deal­ing with excep­tions.

The main top­ic that took my fan­cy was the Back to Basics: Mem­o­ry sec­tion, here’s a sum­ma­ry I put togeth­er with addi­tion­al exam­ple.

In C#, vari­ables are stored in either the Stack or the Heap based on their type:

  • Val­ues types go on the stack
  • Ref­er­ence types go on the heap

Remem­ber, a struct in C# is a val­ue type, as is an enum, so they both go on the stack. Which is why it’s gen­er­al­ly rec­om­mend­ed (for bet­ter per­for­mance) that you pre­fer a struct type to a ref­er­ence type for small objects which are main­ly used for stor­ing data.

Also, val­ue types that belong to ref­er­ence types also go on the heap along with the instance of the ref­er­ence type.

In Java, because every­thing is a ref­er­ence type so all the vari­ables go on the heap mak­ing the size of the heap one of the most impor­tant attrib­ut­es that deter­mine the per­for­mance of a Java appli­ca­tion. The C# cre­ators saw this as inef­fi­cient and unnec­es­sary, which is why we have val­ue types in C# today :-)

The Stack

Val­ues on the stack are auto­mat­i­cal­ly man­aged even with­out garbage col­lec­tion because items are added and removed from the stack in a LIFO fash­ion every time you enter/exit a scope (be it a method or a state­ment), which is pre­cise­ly why vari­ables defined with­in a for loop or if state­ment aren’t avail­able out­side that scope.

You will receive a Stack­Over­flowEx­cep­tion when you’ve used up all the avail­able space on the stack, though it’s almost cer­tain­ly the symp­tom of an infi­nite loop (bug!) or poor­ly designed sys­tem which involves near-end­less recur­sive calls.

The Heap

Most heap-based mem­o­ry allo­ca­tions occur when we cre­ate a new object, at which point the com­pil­er fig­ures out how much mem­o­ry we’ll need, allo­cate an appro­pri­ate amount of mem­o­ry space and returns a point­er to the allo­cat­ed mem­o­ry.

Unlike the stack, objects on the heap aren’t local to a giv­en scope. Instead, most are deeply nest­ed ref­er­ences of oth­er ref­er­enced objects. In unman­aged lan­guages like C, it’s the programmer’s respon­si­bil­i­ty to free any allo­cat­ed mem­o­ry, a man­u­al process which inevitably lead to many mem­o­ry leaks down the years!

In man­aged lan­guage, the run­time takes care of clean­ing up resources. The .Net frame­work uses a Gen­er­a­tion Garbage Col­lec­tor which puts object ref­er­ences into gen­er­a­tions based on their age and clears the most recent­ly cre­at­ed ref­er­ences more often.

How they work together

As men­tioned ear­li­er, every time you cre­ate a new object, some mem­o­ry gets allo­cat­ed and what you assign to your vari­able is actu­al­ly a ref­er­ence point­er to the start of that block of mem­o­ry. This ref­er­ence point­er comes in the form of a unique num­ber rep­re­sent­ed in hexa­dec­i­mal for­mat, and as an inte­ger they reside on the stack unless they are part of a ref­er­ence object.

So for exam­ple, the fol­low­ing code will result in two val­ues on the stack, one of which is a point­er to the string:

int intValue = 1;
string stringValue = "Hello World";

image

When these two vari­ables go out of scope, the val­ues are popped off the stack, but the mem­o­ry allo­cat­ed on the heap is not cleared. Whilst this results in a mem­o­ry leak in C/C++, the garbage col­lec­tor (GC) will free up the allo­cat­ed mem­o­ry for you in a man­aged lan­guage like C# or Java.

Pitfalls in C#

Despite hav­ing the GC to do all the dirty work so you don’t have to, there are still a num­ber of pit­falls which might sting you:

Box­ing & Unbox­ing

Box­ing occurs when a val­ue type is ‘boxed’ into a ref­er­ence type (when you put a val­ue type into an ArrayList for exam­ple). Unbox­ing occurs when a ref­er­ence type is con­vert­ed back into a val­ue type (when you cast an item from the ArrayList back to its orig­i­nal type for exam­ple).

The gener­ics fea­tures intro­duced in .Net 2.0 increas­es type-safe­ty but also address­es the per­for­mance hit result­ing from box­ing and unbox­ing.

ByRef

Most devel­op­ers under­stand the impli­ca­tion of pass­ing a val­ue type by ref­er­ence, but few under­stands why you’d want to pass a ref­er­ence by ref­er­ence. When you pass a ref­er­ence type ByVal­ue you are actu­al­ly pass­ing a copy of the ref­er­ence point­er, but when you pass a ref­er­ence type ByRef you’re pass­ing the ref­er­ence point­er itself.

The only rea­son to pass a ref­er­ence type by ref­er­ence is if you want to mod­i­fy the point­er itself – as in where it points to. How­ev­er, this can lead to some nasty bugs:

void Main()
{
    List<string> list = new List<string> { "Hello", "World"; };

    // pass a copy of the reference pointer
    NoBug(list);
    // no error here
    Console.WriteLine(list.Count);

    // pass the actual reference pointer
    BadBug(ref list);
    // reference pointer has been amended, this throws NullReferenceException!
    Console.WriteLine(list.Count);
}

public void BadBug(ref List<string> list)
{
    list = null; // this changes the original reference pointer
}

public void NoBug(List<string> list)
{
    list = null; // this changes the local copy of the reference pointer
}

In almost all cas­es, you should use an out para­me­ter or a sim­ple assign­ment instead (whichev­er that expressed your inten­tion more clear­ly).

Whilst I’m on the top­ic, do you know the dif­fer­ence between using out and using ref? When you pass a para­me­ter to a method using the out key­word, the para­me­ter must be assigned inside the method scope; when you pass a para­me­ter to a method using the ref key­word, the para­me­ter must be assigned before it’s passed to the method.

Man­aged Mem­o­ry Leaks

Yes, mem­o­ry leak is still pos­si­ble in a man­aged lan­guage! Typ­i­cal­ly, this type of mem­o­ry leak hap­pens when you hold on to a ref­er­ence indef­i­nite­ly, though most of the time this might not amount to any notice­able impact on your appli­ca­tion it can sting you rather unex­pect­ed­ly as the sys­tem matures and starts to han­dle greater loads of data. For exam­ple, I ran into a plat­form bug with ADO.NET a lit­tle while back and it took the best part of a week to fig­ure out and fix it! There are mem­o­ry pro­fil­ers out there that can help hunt down mem­o­ry leaks in a .Net appli­ca­tion, the best ones being dot­Trace and ANTS Pro­fil­er. For mem­o­ry pro­fil­ing, I pre­fer ANTS Pro­fil­er which allows you to eas­i­ly com­pare two snap­shots of your mem­o­ry usage.

One spe­cif­ic sit­u­a­tion worth men­tion­ing as a com­mon cause of mem­o­ry leak is events. If, in a class you reg­is­ter for an event, a ref­er­ence is cre­at­ed to your class. Unless you de-reg­is­ter from the event your object life­cy­cle will ulti­mate­ly be deter­mined by the event source. Two solu­tions exist:

1. de-reg­is­ter­ing from events when you’re done (the IDis­pos­able pat­tern is ide­al here)

2. use the WeakEvent Pat­tern or a sim­pli­fied ver­sion.

Anoth­er poten­tial source of mem­o­ry leak is when you imple­ment some of caching mech­a­nism for your appli­ca­tion with­out any expi­ra­tion pol­i­cy, in which case your cache is like­ly to keep grow­ing until it takes up all avail­able mem­o­ry space and thus trig­ger­ing Out­OfMem­o­ryEx­cep­tion.

Frag­men­ta­tion

As your pro­gram runs its course, the heap becomes increas­ing­ly frag­ment­ed and you could end up with a lot of unus­able mem­o­ry space spread out between usable chunks of mem­o­ry.

Usu­al­ly, the GC will take care of this by com­pact­ing the heap and the .Net frame­work will update the ref­er­ences accord­ing­ly, but there are times when the .Net frame­work can’t move an object – when the object is pinned to a spe­cif­ic mem­o­ry loca­tion.

Pin­ning

Pinned mem­o­ry occurs when an object is locked to a spe­cif­ic address on the heap. This usu­al­ly is a result of inter­ac­tion with unman­aged code – the GC updates object ref­er­ences in man­aged code when it com­pacts the heap, but has no way of updat­ing the ref­er­ences in unman­aged code and there­fore before interop­ing it must first pin objects in mem­o­ry.

A com­mon way to get around this is to declare large objects which don’t cause as much frag­men­ta­tion as many small ones. Large objects are placed in a spe­cial heap called the Large Object Heap (LOH) which isn’t com­pact­ed at all. For more infor­ma­tion on pin­ning, here’s a good arti­cle on pin­ning and asyn­chro­nous sock­ets.

Anoth­er rea­son why an object might be pinned is if you com­pile your assem­bly with the unsafe option, which then allows you to pin an object via the fixed state­ment. The fixed state­ment can great­ly improve per­for­mance by allow­ing objects to be manip­u­lat­ed direct­ly with point­er arith­metic, which isn’t pos­si­ble if the object isn’t pinned because the GC might real­lo­cate your object.

Under nor­mal cir­cum­stances how­ev­er, you should nev­er mark your assem­bly as unsafe and use the fixed state­ment!

Garbage Spew­ers

Already dis­cussed here.

Set­ting things to null

You don’t need to set your ref­er­ence types to null after you’re done with that because once that vari­able falls out of scope it will be popped off the stack any­way.

Deter­min­is­tic Final­iza­tion

Even in a man­aged envi­ron­ment, devel­op­ers still need to man­age some of their ref­er­ences such as file han­dles or data­base con­nec­tions because these resources are lim­it­ed and there­fore should be freed as soon as pos­si­ble. This is where deter­min­is­tic final­iza­tion and the Dis­pose pat­tern come into play, because deter­min­is­tic final­iza­tion releas­es resources not mem­o­ries.

If you don’t call Dis­pose on an object which imple­ments IDis­pos­able, the GC will do it for you even­tu­al­ly but in order to release pre­cious resources or DB con­nec­tions in a time­ly fash­ion you should use the using state­ment wher­ev­er pos­si­ble.