Obfuscate It

Thwart Reverse Engineering of Your Visual Basic .NET or C# Code

Gabriel Torok and Bill Leach

This article assumes you're familiar with .NET and C#

Level of Difficulty123

SUMMARY

One of the advantages of the .NET architecture is that assemblies built with it contain lots of useful information that can be recovered using ILDASM, the intermediate language disassembler. A side effect, though, is that someone with access to your binaries can recover a good approximation of the original source code. Here the authors present program obfuscation as a way to deter reverse engineering. In addition, they discuss the different types of obfuscation technologies available and demonstrate the new obfuscation tool that is included in Visual Studio .NET 2003.

Contents

Disassembly
Decompilation
Obfuscation in Depth
Renaming Metadata
Removing Nonessential Metadata
Additional Techniques
Using Dotfuscator Community Edition
Examining the Map File
Obfuscator Pitfalls
Conclusion

By now you are probably familiar with all of the benefits that the metadata-rich Microsoft® .NET Framework architecture brings to the table, from easing the burdens of deployment and versioning to the rich IDE functionality enabled by self-describing binaries. You may not know that the easy availability of all this metadata has introduced a problem that until now was not a concern for most developers. Programs written for the common language runtime (CLR) are easier to reverse engineer. This is not in any way a fault in the design of the .NET Framework; it is simply a reality of modern, intermediate-compiled languages (Java-language applications display the same characteristics). Both Java and the .NET Framework use rich metadata embedded inside the executable code: bytecode in the case of Java, Microsoft Intermediate Language (MSIL) in .NET. Being much higher level than binary machine code, the executable files are laden with information that can be easily deciphered.

With the help of tools like ILDASM (the MSIL disassembler that ships with the .NET Framework SDK) or decompilers such as Anakrino and Reflector for .NET, anyone can easily look at your assemblies and reverse engineer them back into readable source code. Hackers can search for security flaws to exploit, steal unique ideas, and crack programs. This should be enough to give you pause.

Don't worry, though. There's a solution—obfuscation—that will help you thwart reverse engineering. Obfuscation is a technique that provides for seamless renaming of symbols in assemblies as well as other tricks to foil decompilers. When it is properly applied, obfuscation can increase the protection against decompilation by many orders of magnitude, while leaving the application intact. Obfuscation is commonly used in Java environments and for years has been helping companies protect the intellectual property in their Java-based products.

Several third-parties have answered the call by creating obfuscators for .NET code. Microsoft includes the Dotfuscator Community Edition with Visual Studio® .NET 2003 in partnership with our company PreEmptive Solutions, which ships a number of various obfuscator packages.

Using the Dotfuscator Community Edition, this article will teach you all about obfuscation (and a little about decompilation), the types of obfuscation commonly available, and some of the issues you will need to address when working with an obfuscator.

To demonstrate decompilation and obfuscation, we are going to use an open-source implementation of the classic Vexed game. Vexed.NET was written by Roey Ben-amotz and is available at https://vexeddotnet.sourceforge.net/. It's a puzzle game in which your goal is to move similar blocks together, which causes them to disappear. Below is a simple method from the source code of Vexed.NET:

public void undo() { if (numOfMoves>0) { numOfMoves--; if (_UserMoves.Length>=2) _UserMoves = _UserMoves.Substring(0, _UserMoves.Length02); this.loadBoard(this.moveHistory[numOfMmoves - (numOfMoves/50) * 50]); this.drawBoard(this.gr); } }

Disassembly

The .NET Framework SDK ships with a disassembler utility called ILDASM, which allows you to decompile .NET Framework assemblies into IL Assembly Language statements. In order to start ILDASM, you must make sure that the .NET Framework SDK is installed and type ILDASM on the command line followed by the name of the program that you want to disassemble. In our case, we will type "ILDASM vexed.net.exe". This will launch the ILDASM UI, which can be used to browse the structure of any .NET Framework-based application. Figure 1 shows the undo method disassembled.

Figure 1 The Undo Method Disassembled

Decompilation

If you're now thinking that only a small circle of folks who actually know IL Assembly Language will see and understand your source code, remember that the decompilation doesn't stop there. We can recreate the actual source code by using a decompiler. These utilities can decompile a .NET assembly directly back to a high-level language like C#, Visual Basic® .NET, or C++. Let's look at the undo method generated by the Anakrino decompiler:

public void undo() { if (this.numOfMoves > 0) { this.numOfMoves = this.numOfMoves - 1; if (this._UserMoves.Length >= 2) this._UserMoves = this._UserMoves.Substring(0, this._UserMoves.Length - 2); this.loadBoard( this.moveHistory[this.numOfMoves - this.numOfMoves / 50 * 50]); this.drawBoard(this.gr); } }

As you can see, the results are almost identical to the original code. Later, we will revisit this to see the results after obfuscation.

Obfuscation in Depth

Obfuscation is accomplished using a set of related technologies. Its goal is to hide the intent of a program without changing its runtime behavior. It's not encryption, but in the context of .NET code, it might be better. You could encrypt .NET assemblies to make them completely unreadable. However, this methodology suffers from a classic dilemma—since the runtime must execute unencrypted code, the decryption key must be kept with the encrypted program. Therefore, an automated utility could be created to recover the key, decrypt the code, and then write out the IL to disk in its original form. Once that happens, the program is fully exposed to decompilation.

To give an analogy, encryption is like locking a six-course meal into a lockbox. Only the intended diner (in this case, the CLR) has the key and we don't want anyone else to know what he or she is going to eat. Unfortunately, at mealtime the food will be in plain view to all observers. Obfuscation works more like putting the six-course meal into a blender and sending it to the diner in a plastic bag. Sure, everyone can see the food in transit, but besides a lucky pea or some beef-colored goop, they don't know what the original meal is. The diner still gets the intended delivery and the meal still provides the same nutritional value as it did before (luckily, the CLR isn't picky about taste). The trick of an obfuscator is to confuse observers, while still delivering the same product to the CLR.

Of course, obfuscation (or encryption) is not a hundred percent foolproof. Even compiled C++ can be disassembled. If a hacker is persistent enough, she can reproduce your code.

Figure 2 Obfuscation Process

Figure 2** Obfuscation Process **

Obfuscation is a process that is applied to compiled .NET assemblies, not source code. An obfuscator never reads or alters your source code. Figure 2 shows the flow of the obfuscation process. The output of the obfuscator is another set of assemblies, functionally equivalent to the input assemblies, yet transformed in ways that hinder reverse engineering. We will now consider two essential techniques that Dotfuscator Community Edition uses to achieve that goal: renaming and removing nonessential metadata.

Renaming Metadata

The first line of defense in obfuscation is to rename meaningful names with non-meaningful ones. As you know, there is a lot of value in well-chosen names. They help make your code self-documenting and serve as valuable clues that reveal the purpose of the item they represent. The CLR doesn't care how descriptive a name is, so obfuscators are free to change them, typically to one-character names like "a".

Obviously there are constraints on the amount of renaming an obfuscator will be able to perform on a particular application. Generally speaking, there are three common renaming scenarios.

If your application consists of one or more assemblies that are standalone (that is, no unobfuscated code depends on any of the assemblies), then the obfuscator is free to rename an assembly regardless of the name's visibility, so long as the names and references to them are consistent across the set of assemblies. A Windows® Forms application is a good example of this. At the opposite extreme, if your application is designed to be used by unobfuscated code, the obfuscator cannot change the names of types or members visible to those clients. Examples of this type of application are shared class libraries, reusable components, and the like. Somewhere in between are applications that are meant to plug into existing unobfuscated frameworks. In this case, the obfuscator can rename anything not accessed by the environment in which it is running, regardless of visibility. ASP.NET applications are good examples of this type of application.

Dotfuscator Community Edition uses a patented renaming technique called overload induction that adds a twist to renaming. Method identifiers are maximally overloaded after an exhaustive scope analysis. Instead of substituting one new name for each old name, the overload induction technique renames as many methods as possible to the same name, confusing anyone trying to understand the decompiled code.

In addition, as a nice side effect, the size of the application decreases due to the smaller size of the string heap contained in the assembly. For example, if you have a name that is 20 characters long, renaming it to "a" saves 19 characters. In addition, continually reusing names saves space by conserving string heap entries. Renaming everything to "a" means that "a" is stored only once, and each method or field renamed to "a" can point to it. Overload induction enhances this effect because the shortest identifiers are continually reused. Typically, an overload-induced project will have up to 35 percent of the methods renamed to "a".

To see the impact of renaming on decompiled code, take a look at the undo method after the renaming process:

public void c() { if (this.p > 0) { this.p = this.p - 1; if (this.r.Length >= 2) this.r = this.r.Substring(0, this.r.Length - 2); this.a(this.q[this.p - this.p / 50 * 50]); this.a(this.e); } }

You can see that without any other kinds of obfuscation, this method is already much more difficult to understand.

Removing Nonessential Metadata

Not all of the metadata in a compiled .NET-based application is used by the runtime. Some of it is there to be consumed by other tools such as designers, IDEs, and debuggers. For example, if you define a property called "Size" on a type in C#, the compiler will emit metadata for the property name "Size" and associate that name with the methods that implement the get and set operations (which it names "get_Size" and "set_Size", respectively). When you write code that sets the Size property, the compiler will always generate a call to the method "set_Size" itself and will never reference the property by its name. In fact, the name of the property is there for the IDE and developers who are using your code; it is never accessed by the CLR.

If your application is meant to be used by just the runtime and not by other tools, it's safe for an obfuscator to remove this type of metadata. In addition to property names, event names and method parameter names fall into this category. Dotfuscator Community Edition removes all these types of metadata when it deems that it is safe to do so.

Additional Techniques

Dotfuscator Community Edition provides good obfuscation using the techniques we've just described, but you should be aware of additional obfuscation techniques that provide even stronger protection and may foil reverse engineering altogether. Dotfuscator Professional Edition implements many additional techniques, including control flow obfuscation, string encryption, incremental obfuscation, and size reduction.

Control flow is a powerful obfuscation technique, the goal of which is to hide the intent of a sequence of instructions without changing the logic. More importantly, it is used to remove the clues that decompilers look for in order to faithfully reproduce high-level source code statements, such as if-then-else statements and loops. In fact, this technique tends to break decompilers.

To see this effect in action, look at the decompiled undo method again, after applying renaming and control flow obfuscation (see Figure 3). You can see that instead of the original nested if statements, the decompiler has produced an if statement, two nested while loops, and some gotos to tie it all together. The label i1 is referenced but it is not generated by the decompiler (this is a decompiler bug, we presume).

Figure 3 Undo Method After Renaming

public void c() { if (this.o > 0) { goto i0; do { while (true) { this.a(this.p[this.o - this.o / 50 * 50]); this.a(this.d); goto i1; i2: this.q = this.q.Substring(0, this.q.Length - 2); } i0: this.o = this.o - 1; } while (this.q.Length < 2); goto i2; } }

String encryption is a technique that applies a simple encryption algorithm to string literals embedded in your application. As mentioned before, any encryption (or specifically, decryption) that's performed at run time is inherently insecure. That is, a smart hacker can eventually break it, but for strings in application code, it is worthwhile. Let's face it, if hackers want to get into your code, they don't blindly start searching renamed types. They probably do searches for "Invalid License Key" which point them right to the code where license handling is performed. Searching on strings is incredibly easy; string encryption raises the bar because only the encrypted version is present in the compiled code.

Incremental Obfuscation helps with the challenge of issuing a patch to fix a customer's problems in the face of obfuscation. Fixing bugs in code often creates or deletes classes, methods, or fields. Changing code (for example, adding or deleting a method) may cause subsequent obfuscation runs to rename things slightly differently. What was previously called "a" might now be called "b". Unfortunately, how and what was renamed differently is a mystery.

Incremental obfuscation can combat this problem. Dotfuscator creates a map file to tell you how it performed the renaming. That same map file, however, can be used as input to Dotfuscator on subsequent runs to dictate that renames used previously should be used again wherever possible. If you release your product and then patch a few classes, Dotfuscator can be run in such a way as to mimic its previous renaming scheme. That way, you can issue just the patched classes to your customers.

Size reduction does not strictly impede reverse engineering, but it is worth mentioning because obfuscators almost always have to perform a dependency analysis on the set of input assemblies. Thus the obfuscator is in a good position to do more than obfuscate, and some of the better ones will use their knowledge of your application to remove code that your program is not using. It seems odd that unused code removal can actually do anything—who writes code they don't use? Well, the answer is all of us. What's more, we all use libraries and types written by other people that were written to be reusable.

Reusable code implies there is contingent code that handles many cases; however, in any given application, you typically only use one or two of those many cases. An advanced obfuscator can determine this and strip out all the unused code (again, from the compiled assembly, not the source). The result is that the output contains precisely the types and methods your application needs—nothing more. A smaller application has the benefits of conserving computing resources and reducing load times. This can be especially important for apps running on the .NET Compact Framework or distributed applications.

Using Dotfuscator Community Edition

Now let's use Dotfuscator Community Edition to obfuscate the Vexed application. Dotfuscator Community Edition uses a configuration file that specifies the obfuscation settings for a particular application. It has a GUI to help you easily create and maintain the configuration file as well as run the obfuscator and examine the output. In addition, the Dotfuscator Community Edition's command-line interface allows you to easily integrate obfuscation into your automated build process. You can launch the GUI right from the tools menu of Visual Studio .NET 2003.

To configure Vexed for obfuscation, you need to specify three items in the Dotfuscator Community Edition GUI: the input assembly, the map file location, and the output directory. The input assemblies (Dotfuscator calls these "trigger assemblies") are specified on the Trigger tab. You can add as many here as you want, but you only need one for the Vexed application.

You specify the map file location on the Rename | Options tab (see Figure 4). The map file is an essential piece of information that contains the unambiguous name mappings between the original and unobfuscated names. It is very important to keep this file after you obfuscate your application; without it, you will not be able to easily troubleshoot the obfuscated app. Due to its importance, Dotfuscator will not overwrite an existing map file by default unless you explicitly check the "Overwrite Map file" box.

Figure 4 Rename | Options Tab

Finally, the Build tab allows you to specify the directory where the obfuscated application will be placed. Once you have done that, you are ready to obfuscate the application. You can save your configuration file for later use, then either press the "Build" button on the Build tab or use the "Play" button on the toolbar. While building, Dotfuscator displays progress information in the GUI's output pane. You can control the amount of information that is displayed here by choosing Quiet or Verbose on the Options tab.

Once the build is complete, you can visually explore the results on the Output tab, shown in Figure 5. As you can see, Dotfuscator displays a graphical view of the application similar to an object browser. The new names are immediately below the original names in the view. In the figure, you can see that the class named "board" was renamed to "h", and two methods with different signatures (init and ToImage) were both renamed "a".

Figure 5 Output Browser

Examining the Map File

The map file that Dotfuscator produces is an XML-formatted file, and in addition to the already mentioned name mappings, it contains some statistics that give a sense of how effective the renaming process was. Figure 6 summarizes the statistics for types and methods after obfuscating the Vexed application.

Figure 6 Renaming Statistics

  Total Number Renamed % Renamed % Methods Renamed to 'a'
Classes 20 20 100% -
Methods 164 144 88% 39%

Map files are also used to perform incremental obfuscation. This process allows you to import names from a previous run, which tells the obfuscator to perform renaming in the same way as it was performed previously. If you are releasing a patch (or a new plug-in) for an already obfuscated application, you can obfuscate the updates using the same name set as the original version. This is of particular interest to enterprise development teams maintaining multiple interdependent applications.

Obfuscator Pitfalls

Obfuscation—especially renaming—can be tricky on complex applications and is highly sensitive to correct configuration. If you aren't careful, the obfuscator can break your application. In this section, we'll discuss some of the more common issues that can arise when using an obfuscator.

First, you need to do a little more work when your application includes a strongly named assembly. Strongly named assemblies are digitally signed, allowing the runtime to determine if an assembly has been altered after signing. The signature is an SHA1 hash signed with the private key of an RSA public/private key pair. Both the signature and the public key are embedded in the assembly's metadata. Since an obfuscator modifies the assembly, it is essential that signing occur after obfuscation. You should delay-sign the assembly during development and before obfuscation, then complete the signing process afterward. See the .NET Framework documentation for more details about delay-signed assemblies, and remember to turn off strong name validation while testing your delay-signed assemblies.

The use of the Reflection API and dynamic class loading will also complicate the obfuscation process. Since these facilities are dynamic, they tend to defeat the static analysis techniques used by most obfuscators. Consider the following C# code snippet that gets a type by name and dynamically instantiates it, returning the type cast to an interface:

public MyInterface GetNewType() { Type type = Type.GetType( GetUserInputString(), true ); object newInstance = Activator.CreateInstance( type ); return newInstance as MyInterface; }

The name of the type is coming from another method. GetUserInputString may be asking the user to enter a string, or perhaps it retrieves it from a database. Either way, the type name is not present in the code for a static analysis to recover, so there is no way of knowing which types in the input assemblies may be instantiated in this manner. The solution in this case is to prevent renaming of all potentially loadable types that implement MyInterface (note that method and field renaming can still be performed). This is where manual configuration and some knowledge of the application being obfuscated plays an important role. Dotfuscator Community Edition gives you the tools to prevent the renaming of select types, methods, or fields. You can pick and choose individual names; alternatively, you can write exclusion rules using regular expressions and other criteria, such as visibility on scope. For example, you could exclude all public methods from renaming.

Another issue with using an obfuscator occurs after you have deployed an obfuscated application and you are trying to support it. Say your application is throwing an exception (which happens even to the best of us) and a customer sends you a stack dump that looks something like this:

System.Exception: A serious error has occurred at cv.a() at cv..ctor(Hashtable A_0) at ar.a(di A_0) at ae.a(String[] A_0)

Obviously this is a lot less informative than a stack dump from the unobfuscated program. The good news is that you can use the map file generated during obfuscation to decode the stack trace back to the original. The bad news is that there is sometimes not enough information in the stack trace to unambiguously retrieve the original symbols from the map file. For example, notice in the dump that the method return types are omitted. In applications obfuscated with an enhanced overload induction renaming algorithm, methods that differ only by return type may be renamed to the same name. So the stack trace can be ambiguous. Most of the time, you can narrow the possibilities enough to find the original names to a high degree of certainty. To help, Dotfuscator Professional provides a tool to automatically translate the stack trace back to the original offending method.

Conclusion

You don't need to let hackers use the handy ILDASM utility on your app for questionable purposes. You can protect your code with a good obfuscator. Obfuscation raises the reverse engineering bar. In the Visual Studio .NET 2003 box, Dotfuscator Community Edition makes good obfuscation just a few clicks away.

For related articles see:
Inside Microsoft .NET IL Assembler by Serge Lidin (Microsoft Press, 2002)
Dotfuscator FAQ

For background information see:
Ildasm.exe Tutorial
https://www.preemptive.com

Gabriel Torok is President of PreEmptive Solutions. He is a coauthor of JavaScript Primer Plus and of Java Primer Plus, both published by Macmillan. Gabriel has given talks and tutorials at software development conferences around the world.

Bill Leach is the Chief Technical Officer at PreEmptive Solutions. He serves as architect and technical lead for the Dotfuscator product line. Bill has served as a technical reviewer for software development books and articles.