Extension methods for compressing/decompressing string

Serialization Overhead

When it comes to serializing/deserializing objects for trans­port through the wire, you will most like­ly incur some over­head in the seri­al­ized mes­sage though the amount of over­head varies depends on the data inter­change for­mat used — XML is over­ly ver­bose where as JSON is much more light-weight:

public class MyClass
{
    public int MyProperty { get; set; }
}

var myClass = new MyClass { MyProperty = 10 };

XML rep­re­sen­ta­tion:

<MyClass>
    <MyProperty>10</MyProperty>
</MyClass>

JSON rep­re­sen­ta­tion:

{“MyProperty”:10}

As you can see, a sim­ple 4-byte object (MyProp­er­ty is a 32-bit inte­ger) can take up over 11 times the space after it’s seri­al­ized into XML for­mat:

Bina­ry XML JSON
4 byte 46 bytes 17 bytes

This over­head trans­lates to cost in terms of both band­width as well as per­for­mance, and if per­sis­tence is involved then there’s also the stor­age cost. For exam­ple, if you need to per­sist objects onto an Ama­zon S3 buck­et, then not only would you be pay­ing for the wastage intro­duced by the seri­al­iza­tion process (extra space need­ed for stor­age) but also the addi­tion­al band­width need­ed to get the seri­al­ized data in and out of S3, not to men­tion the per­for­mance penal­ty for trans­fer­ring more data.

Using Compression

An easy way to cut down on your cost is to intro­duce com­pres­sion to the equa­tion, con­sid­er­ing that the seri­al­ized mes­sage in XML/JSON is text which can be eas­i­ly com­pressed into 10–15% of its orig­i­nal size, there’s a com­pelling case to do it!

There are a num­ber of 3rd par­ty com­pres­sion libraries out there to help you do this, for instance:

  • SharpZi­pLib – a wide­ly used library with sup­port for Zip, GZip, Tar and BZip2 for­mats.
  • Sev­erZip­Sharp – code­plex project which pro­vides a wrap­per for the native 7Zip library to pro­vide data (self-)extraction and com­pres­sion in all 7-ziop for­mats.
  • UnRAR.dll – native library from the devel­op­er of Win­RAR to help you work with the RAR for­mat.

The .Net frame­work also pro­vides two class­es for you to use – Deflat­eStream and GZip­Stream – which both uses the Deflate algo­rithm (GZip­Stream inher­its from the Deflat­eStream class) to pro­vide loss­less com­pres­sion and decom­pres­sion. Please note you can’t use these class­es to com­press files larg­er than 4GB though.

Here’s two exten­sion meth­ods to help you compress/decompress a string using the framework’s Deflat­eStream class:

public static class CompressionExtensions
{
    /// <summary>
    /// Returns the byte array of a compressed string
    /// </summary>
    public static byte[] ToCompressedByteArray(this string source)
    {
        // convert the source string into a memory stream
        using (
            MemoryStream inMemStream = new MemoryStream(Encoding.ASCII.GetBytes(source)),
            outMemStream = new MemoryStream())
        {
            // create a compression stream with the output stream
            using (var zipStream = new DeflateStream(outMemStream, CompressionMode.Compress, true))
                // copy the source string into the compression stream
                inMemStream.WriteTo(zipStream);

            // return the compressed bytes in the output stream
            return outMemStream.ToArray();
        }
    }
    /// <summary>
    /// Returns the base64 encoded string for the compressed byte array of the source string
    /// </summary>
    public static string ToCompressedBase64String(this string source)
    {
        return Convert.ToBase64String(source.ToCompressedByteArray());
    }

    /// <summary>
    /// Returns the original string for a compressed base64 encoded string
    /// </summary>
    public static string ToUncompressedString(this string source)
    {
        // get the byte array representation for the compressed string
        var compressedBytes = Convert.FromBase64String(source);

        // load the byte array into a memory stream
        using (var inMemStream = new MemoryStream(compressedBytes))
            // and decompress the memory stream into the original string
            using (var decompressionStream = new DeflateStream(inMemStream, CompressionMode.Decompress))
                using (var streamReader = new StreamReader(decompressionStream))
                    return streamReader.ReadToEnd();
    }
}

Please NOTE that the com­pressed string can be longer than the uncom­pressed string when the uncom­pressed string is very short, as always you should make a judge­ment based on your sit­u­a­tion whether com­pres­sion is worth­while giv­en that it also requires addi­tion­al CPU cycles for the compression/decompression steps.

The good news is, as seri­al­ized mes­sages tend to blow up fair­ly quick­ly (espe­cial­ly when there are arrays involved), in almost all cas­es you should see a sig­nif­i­cant sav­ing on the size of the seri­al­ized mes­sage and there­fore stor­age and band­width cost as well!