Writing a Hex-Rays Plugin: VMX Intrinsics

I’ve been very excited to work with the new Hex-Rays Decompiler microcode API, and I’ve finally had the chance to sit down and build a useful plugin. This post describes the development process: the things I tried that didn’t work and the weird hacks that ultimately did.

The plugin (C++ code) is available on Github at https://github.com/dougallj/dj_ida_plugins/blob/master/dj_vmx_intrinsics/dj_vmx_intrinsics.cpp

Example output

Without the plugin:


With the plugin:


Original source for reference:

 uint64_t update_rip(uint64_t new_rip) {
   uint64_t old_rip;
   __vmx_vmread(GUEST_RIP, &old_rip);
   if (__vmx_vmwrite(GUEST_RIP, new_rip) != 0)
     return 0;
   return old_rip;

Hex-Rays API

The Hex-Rays API consists of a single header file, hexrays.hpp, which is the primary source of documentation, as well as a collection of examples, which are very useful.

Adding new intrinsics is similar to example 8 (hexrays_udc.cpp, not published online), but the assumes the call takes arguments in fixed registers, whereas intrinsics generally take their arguments from the instruction’s operands. This presented a significant challenge, as I could not rely on the udc_filter_t infrastructure to generate helper calls.

Dumping microcode

The hexrays_sample9.cpp example shows how to dump microcode. By dumping microcode at different maturity, you can see how the decompiler is generating, optimizing and simplifying code. This was my most useful tool. I found examples of intrinsics already being generated, dumped the generated microcode, and wrote code to produce roughly the same microcode for other instructions.

There is one big problem with this, though. The code to display the microcode as a string is not public, so I had a hard time understanding how the human read-able representation mapped to the underlying C++ structures I had to generate. I modified the example code with a lot of one-time print debugging to dump out the structures, alongside the string representations, so I could see what was going on.

Defining instructions

Unhandled instructions show up as __asm blocks in the decompiler output, but are represented by the ext opcode in the microcode. I originally tried writing an optinsn_t handler to translate ext instructions to helper calls (based on the Hex Blog post Deobfuscating xor’ed strings). This worked, but the _RDX style register names were still present and were not propagated into the call arguments, which undermined a lot of the purpose. I assume this information is tracked separately.

Instead I used a microcode_filter_t. During a apply(codegen_t& cdg) callback, I add new opcodes to the codegen_t‘s “mblock_t mb member, which allows generating arbitrary microcode for instructions.

Temporary registers

The __vmx_vmwrite intrinsic (and most others), return 0, 1, or 2, to represent the combination of flags indicating the status of the operation (see __vmx_vmwrite on MSDN for more information). As such, I wanted to move the result to a temporary register, and set the flags based on that value:

mov call !__vmx_vmwrite<fast:"size_t" r9.8,"size_t" rax.8>.1, tt.1
setz tt.1, #1.1, zf.1
setz tt.1, #2.1, cf.1

The tt register is a temporary used by the Hex-Rays x86-64 microcode emitter, but I couldn’t find a way to access it in the API. I ended up using the mop_t::dstr function to get the string representation of every register, then hardcoded the resulting value:

const mreg_t mr_tt = mreg_t(0xC0);

This is presumably a very fragile approach – let me know if you can recommend a better alternative.


I originally had no problems with my intrinsic call being optimized away, but as soon as I moved it into the tt register, the pre-optimizer pass (and I think other passes) started deleting calls if the result was unused (even though the FCI_NOSIDE flag was not set). To work around this I chose to make my intrinsics appear to spoil memory.

Again, I couldn’t figure out how to spoil GLBLOW and GLBHIGH, so I ended up hardcoding GLBLOW based on dumping values from another instruction, which the debug-printer showed as spoiling it.

There are quite a few functions defined in the hexrays.hpp header that do not link (although I expect this will be fixed soon), and mlist_t::add(const ivl_t &) was one of them, so I had to add it to the mlist_t‘s underlying ivlset_t directly:

ivl_t glblow(0, 0x100000);

Taking the address of registers

The __vmx_vmread intrinsic takes an output pointer as the second parameter. But the instruction can write its output to a register. Hex-Rays provides an operand type (mop_a/mop_addr_t) which should work for this but I had aliasing problems which I couldn’t figure out. (Writing to “rax” then using “al” would appear as two different variables for some reason.)

Instead, I chose to generate a __vmread which writes the result value to the destination register or memory, and to simply undefine the cf and zf flags (using m_und). This means that if the return value is used, references to undefined variables will show up in Hex-Rays with comments (at the top of the function) showing them to map to cf and zf. But in the common case where the return value is ignored the resulting pseudocode is nicer (and correct). 

Accessing operands

x86-64 operands can be fairly complex. The Hex-Rays API provides an abstraction for this in the form of the function mreg_t codegen_t::load_operand(int opnum), which can load an operand to a microcode register from an assembly register. However, it cannot store an operand, nor can it get the address of an operand.

To avoid re-implementing full operand decoding, I call load_operand then mutate the output to either store, instead of load, or to get the address instead of accessing memory. This is not a solid choice, and there may be some edge cases where this causes real problems, but due to the way temporary registers are used by the microcode generator it should be work for the expected operand types.

Future work

Currently, when we decompile “return __vmx_vmwrite(…);” we end up with the confusing (but correct):

v1 = __vmx_vmwrite(...);
return (v1 == 2) + (v1 == 2) + (v1 == 1);

Fixing this should be possible, since we know “v1” is either 0, 1, or 2, but it’s not clear how best to do it. Possibly this is a good case for an optinsn_t callback.

There are a few TODOs in the code. A few of the types aren’t currently correct, but not in ways that should cause problems, and it would be good to have an option to generate the correct __vmx_vmread (as discussed in “taking the address of registers” above).

I also hope to make the plugin more solid in the future, if I can find safer ways to do some of the things described above.

Final notes

A huge thanks to everyone at Hex-Rays for releasing this. I love being able to understand the internals of the decompiler and to develop plugins to make reverse engineering easier in various situations.

All in all, this seems to work, but given the hacks described above it isn’t likely to be 100% solid. Hopefully future revisions of the Hex-Rays API will improve the documentation, add more examples, and possibly provide higher-level interfaces to make plugins like this easier.

The same technique can be used to change the code generation for a wide variety of instructions, and to help with all sorts of problems. You can add new intrinsics for almost anything, or modify existing ones, such as rewriting __readgsqword(0x188) calls to KeGetCurrentThread(). There are a lot of useful possibilities. I hope my example code is helpful, and I hope that others writing Hex-Rays plugins will take the time to release example code.

The code can be found at https://github.com/dougallj/dj_ida_plugins/blob/master/dj_vmx_intrinsics/dj_vmx_intrinsics.cpp and you can find me on Twitter at @dougallj.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s