Memory efficient strings in .NET

I’m sure all of you know the basic fundamentals about strings in .NET: They’re immutable, the equality operator compares value instead of reference, etc. But did you know each character in a string takes up 2 bytes of memory? That’s right. Every standard string in .NET is a unicode string in memory, meaning the string “abc” would look like: 61 00 62 00 63 00 in memory. This might not sound like a big deal, but think about a string that is, say, 32768 characters long would take up 65536 bytes in memory (if we exclude metadata stuff). So why is implemented like this? Most likely because of compatibility and uniform of code. For example, a lot of WinAPIs have ugly suffixes on their names such as “MessageBoxW/MessageBoxA” or “LoadLibraryW/LoadLibraryA” depending on if they take a wide unicode string or a standard ASCII string. If every string is always assumed to be unicode, this isn’t an issue.

But what if we have a string only containing characters included in the ASCII table (0-127), for example: The quick brown fox jumps over the lazy dog”. It feels redundant to add an extra 00 byte after each character, doesn’t it?

So I thought I could create a class that stores strings in memory with ASCII encoding rather than unicode, meaning we could essentially half the memory usage for each string object. But after some googling I found out Jon Skeet had a great article about this very subject. I’m gonna base my struct on his example, but extend it a bit to make it a bit easier to use in your code.

View the struct here:

Keep in mind it’s not finished, and there’s just very basic safety checks which needs to be improved. But the struct should you give you an idea on how it could be done. If you wish to take the challenge on completing the struct to mimic the .NET string better, you could look at the .NET String implementation and work from there. Also, there’s some neat implicit operator overloads that allow you to do this:

 static void Main(string[] args)
     AsciiString str1 = "hello world"; // about half memory of str2
     string str2 = "hello world";      // about double memory of str1



But enough talk. Let’s take a look at the memory usage of these two strings:

String of size 100:

ss (2014-05-27 at 12.17.53)

String of size 1000:

ss (2014-05-27 at 12.18.35)

You have to keep in mind though that since there’s no official support for Ascii strings in .NET the AsciiString object will be converted back to a full size unicode string when used in methods accepting a .NET string. Let’s just hope that in the future the .NET developers might consider adding an additional type and native support for this. 🙂

How does SJITHook work?

In my last post I released a project called SJITHook, which is basically just a small class to easily create a hook to the .NET just-in-time compiler in order to be able to modify or analyze methods at runtime. I’m sure a lot of you guys know how it works, but for those who don’t I’ll try to explain it right now. But before anything, you should read about the .NET just-in-time compiler to keep up.

There are 2 DLL’s used in every standard .NET application, either Mscorjit.dll or Clrjit.dll depending on what .NET version the assembly is targetting. Mscorjit targets 2.0 and below while Clrjit.dll target 4.0 and above. They have two things in common though, which makes it a lot easier for us to write hooks for them. They both have an export called “getJit” which returns the address of the V(irtual function)Table in a class called CILJit. If you’re unaware of what a VTable is then read: Virtual method table. They also both have the same signature for the function we’re gonna hook, called “compileMethod”. This method takes your raw IL code along with a LOT of metadata crap and other stuff and compiles it to native assembly code. So each time a function is ran for the first time, it’s ran through compileMethod, turned in to assembly code, and then if it’s called again it jumps straight to the assembly code. A lot of obfuscators and protections take advantage of this process. They do this by encrypting the IL method bodies in an assembly, so they are unreadable to a static decompiler and then decrypts the code in a compileMethod hook and then passes it on to the original compileMethod in clrjit/mscorjit.

I created a quick diagram to hopefully help visualize what this means:


Let’s take an example of the code in SJITHook where I retrieve the pointer to the VTable:

ss (2014-05-12 at 07.37.51)


The returned pointer is the one referred to as “pVTable” in the diagram above too. So in order to get a pointer to the VTable (or more exactly, the first entry in the VTable) we dereference pVTable:

ss (2014-05-12 at 07.47.06)

It’s a quite straight forward process from here on. It basically goes:

  1. Save original address of compileMethod. (See this)
  2. Overwrite the first entry in the VTable that points to original compileMethod, with our own function. It needs to have the exact same signature and calling convention to not mess up stack. (See this)
  3. In our hooked compileMethod function we can do whatever we want, but we have to call original compileMethod at the end. (See this)
  4. When we’re finished, we can unhook by rewriting address of original compileMethod into first entry in VTable. (See this)

If we take a look at the hook from a physical memory point of view, this is what the VTable looks like (first line is compileMethod pointer):

before hook: ss (2014-05-12 at 08.15.04)


after hook:   ss (2014-05-12 at 08.18.40)

There are however a few things we have to keep in mind when doing this. Most importantly, we have to recognize that our own functions will trigger the JIT and in turn our hook which creates an infinite loop and crashes the application. This is why we have to ‘prepare’ our own defined methods before creating the hook. Using RuntimeHelpers.PrepareMethod/Delegate forces the method to be ran through the JIT even though it’s not executed:

ss (2014-05-12 at 08.01.33)


Something to note is that there are numerous ways of hooking a function, this is just one of them. The reason I use this method is because it’s easy (automatic even) to make it support both x86 and x64 bit application with the usage of .NET’s IntPtr type. Read more about hooks here:

I hope this gave you some insight on what SJITHook does and how it works internally. And if there’s still things that are unclear, feel free to leave a comment below and I’ll do my best to answer it. 🙂

SJITHook (Simple JIT hook) up on GitHub

Just a small little project I decided to release. It allows you to easily create a JIT hook in a .NET application. It can be used to do stuff like dynamic method decryption at runtime (commonly used in obfuscation). You can find the project here: SJITHook.

If you’re unsure how to use it, please read the ReadMe.

Example of what it can do:

ss (2014-05-11 at 08.15.09)

I’ll soon release a blog entry here explaining exactly what this project does, and how it works more in depth. 🙂

Updating method at runtime (native)

I thought I’d show you something fun to do in C#; changing method bodies at runtime, AFTER they’ve been JIT’d. This is a fairly simple process actually. The first thing we have to do is load our assembly using reflection:

 Assembly asm = Assembly.LoadFile(Path.GetFullPath("test.exe"));
 Type mainType = asm.EntryPoint.DeclaringType;
 MethodInfo method = mainType.GetMethod("Foo",
 BindingFlags.InvokeMethod | BindingFlags.NonPublic | BindingFlags.Static);

The method we’re gonna use looks like this:

Let’s invoke it with 1000 as parameter. Expected output would be 1000*1000:

Console.WriteLine("Original value: " + method.Invoke(null, new object[] { 1000 }));

And indeed it shows 1000*1000:

Now let’s change this method to show something else! First of all we would need to force the method to be JIT’d in order for the assembly code to even be present in memory. But since we already invoked the method, it has already been JIT’d, but if we didn’t we could use:


Okay now to the interesting part. Before I start with this you should read .NET Framework Internals: MethodDesc to understand why this works. First thing we need to do is retrieve a pointer to the native method:

IntPtr pBody = method.MethodHandle.GetFunctionPointer();

Since we know our target int is 0x3e8 (1000), we iterate the body until we find it, we then write our new 0x539 (1337) in place of it:

   var ptr = (byte*) pBody.ToPointer();
   for (var i = 0; i < 100; i++)
      // Assuming our 0xE8 byte is the first one
      if ((*(ptr+i)) == 0xe8)
           (*(int*)(ptr+i)) = 0x539;

That’s it! Let’s call the method the same way we did in the beginning:

Console.WriteLine("New value: " + method.Invoke(null, new object[] { 1000 }));

It should just return 1000*1000 as before, right? Of course not, silly. We replaced the 1000 with 1337:

I’m not sure what this could be used for in practice. Perhaps some sort of obfuscation? But then you’d probably be better off using MethodRental.SwapMethodBody to work with the IL code rather than assembly. Hope you found this little example interesting. 🙂

Download files used:

Manually deobfuscating DotNetPatcher 3.1


I’ve seen a lot of people lately having issues deobfuscating this protection, mostly because it can’t be done automatically be drag&dropping the file onto de4dot. So I’m gonna show you how to do it manually with WinDbg and De4dot in this paper.

Read it here: Manually deobfuscating DotNetPatcher 3.1

DnSpy – More powerful .NET decompiler

I’m sure you all know of the standard decompilers such as ILSpy or Reflector. They both rely on Mono.Cecil to read assemblies, which is quite a fragile library when it comes to loading obfuscated files. Because of this you should use DnSpy, a ILSpy mod by 0xd4d which uses dnlib to read files instead of Cecil. It’ll load obfuscated files and allow you to browse them (IL->C#/VB conversion still might crash).

Check the source out: DnSpy 

or download compiled binaries: