ascii

Memory efficient strings in .NET

I’m sure all of you know the basic fundamentals about strings in .NET: They’re immutable, the equality operator compares value instead of reference, etc. But did you know each character in a string takes up 2 bytes of memory? That’s right. Every standard string in .NET is a unicode string in memory, meaning the string “abc” would look like: 61 00 62 00 63 00 in memory. This might not sound like a big deal, but think about a string that is, say, 32768 characters long would take up 65536 bytes in memory (if we exclude metadata stuff). So why is implemented like this? Most likely because of compatibility and uniform of code. For example, a lot of WinAPIs have ugly suffixes on their names such as “MessageBoxW/MessageBoxA” or “LoadLibraryW/LoadLibraryA” depending on if they take a wide unicode string or a standard ASCII string. If every string is always assumed to be unicode, this isn’t an issue.

But what if we have a string only containing characters included in the ASCII table (0-127), for example: The quick brown fox jumps over the lazy dog”. It feels redundant to add an extra 00 byte after each character, doesn’t it?

So I thought I could create a class that stores strings in memory with ASCII encoding rather than unicode, meaning we could essentially half the memory usage for each string object. But after some googling I found out Jon Skeet had a great article about this very subject. I’m gonna base my struct on his example, but extend it a bit to make it a bit easier to use in your code.

View the struct here: http://pastebin.com/tLhzTacj

Keep in mind it’s not finished, and there’s just very basic safety checks which needs to be improved. But the struct should you give you an idea on how it could be done. If you wish to take the challenge on completing the struct to mimic the .NET string better, you could look at the .NET String implementation and work from there. Also, there’s some neat implicit operator overloads that allow you to do this:

 static void Main(string[] args)
 {
     AsciiString str1 = "hello world"; // about half memory of str2
     string str2 = "hello world";      // about double memory of str1

     Console.WriteLine(str1);
     Console.WriteLine(str2);

     Console.ReadLine();
 }

But enough talk. Let’s take a look at the memory usage of these two strings:

String of size 100:

ss (2014-05-27 at 12.17.53)

String of size 1000:

ss (2014-05-27 at 12.18.35)

You have to keep in mind though that since there’s no official support for Ascii strings in .NET the AsciiString object will be converted back to a full size unicode string when used in methods accepting a .NET string. Let’s just hope that in the future the .NET developers might consider adding an additional type and native support for this. 🙂

Advertisements