Detecting debuggers by abusing a bad assumption within Windows

This blog post will go over an assumption made over a decade ago by Microsoft when dealing with software breakpoints that can be used to reveal the presence of most (all publicly available?) usermode and kernelmode debuggers.

The x86 architecture can potentially encode a particular assembly instruction in multiple ways. For example, adding two registers, eax and ebx, and storing the result in eax takes the following mnemonic form: add eax, ebx. This can be encoded as the byte sequence 0x03 0xC3 or 0x01 0xD8. Fundamentally, the machine code represents the same assembly operation.

If you're just interested in the anti-debug trick (without any context on why it works the way it does), scroll to the bottom of this post. For the rest of you brave enough to read this article in its entirety... buckle up. 

The "long form" of int 3

An int 3 can be encoded as either a single-byte 0xCC or via the more unconventional way as the multi-byte sequence 0xCD 0x03:

From the Intel Instruction Set Reference (Volume 2, Chapter 3, Section 3.2).

So, what happens when Windows encounters a multi-byte int 3? We create a simple C++ program to find out:

After you run this application, you should see output similar to this:


A single-byte int 3 (0xCC) works as expected. The start of the stub is located at 0x000001BE94B90000. When the stub is executed, the exception handler fires and we see that both the _EXCEPTION_RECORD.ExceptionAddress and _CONTEXT.Rip are located at 0x000001BE94B90000. This is the start of the int 3 instruction. Excellent!

The multi-byte int 3 (0xCD 0x03) is located at address 0x000001BE94B90002. When this stub executes, the exception handler proclaims that the _EXCEPTION_RECORD.ExceptionAddress and _CONTEXT.Rip are located at 0x000001BE94B90003. This is in the middle of the int 3 instruction. Why? What went wrong?

The assumption

NOTE: From this point on, all disassembly and pseudo-source is reconstructed from system files that are provided with Windows x64 10.0.15063 (Creator's Update). If you'd like to follow along, make sure you use the same version I'm using!

Microsoft assumes that all int 3's result from the single-byte variant.

This assumption occurs very early during interrupt processing. Namely, when any interrupt occurs, such as when an int 3 is executed by the processor, control is redirected by the CPU to a handler registered in the appropriate position of the IDT (Interrupt Descriptor Table). In Windows, the handler for software breakpoints can be found at the symbol nt!KiBreakpointTrap:

The first thing nt!KiBreakpointTrap does is generate a trap frame (_KTRAP_FRAME) on the stack that it passes to subsequent routines. A definition of this structure can be found below:

Parts of this structure are automatically filled by the CPU when the interrupt fires, in particular, the range from +0x160 (_KTRAP_FRAME.ErrorCode) to +0x188 (_KTRAP_FRAME.SegSs):

From the Intel Instruction Set Reference (Volume 3, Chapter 6, Section 6.12).

The _KTRAP_FRAME is essentially an extension of the elements saved on the stack by the CPU. It's purpose is to provide a place to store volatile registers which can be clobbered when calling into functions that are compiled in C.

A very important thing to note is that the instruction pointer (EIP) saved by the CPU on the stack (_KTRAP_FRAME.Rip) will be set to the instruction immediately following the one that caused entry into the handler. In our scenario, this means that the _KTRAP_FRAME.Rip member will be the instruction following our int 3, which will be ret (0xC3) in the example code above.

After the volatile registers have been saved off, nt!KiBreakpointTrap performs a quick check to see whether the interrupt fired from usermode (ring3) or kernelmode code (ring0). If execution is coming from ring3, a swapgs needs to occur as well as some other bookkeeping with debug registers.

Eventually, control flow will reconvene and the volatile floating point registers will also be stored off into the _KTRAP_FRAME. Before entering into more exception handling logic, the instruction pointer will be extracted from _KTRAP_FRAME.Rip (saved by the CPU upon entering nt!KiBreakpointTrap), decremented by one, and passed as an argument to nt!KiExceptionDispatch. Additionally, the exception code, EXCEPTION_BREAKPOINT (0x80000003), will also be passed in. The prototype for nt!KiExceptionDispatch:

It's important to note that nt!KiExceptionDispatch (like nt!KiBreakpointTrap) is written in hand-ASM. It assumes that ecx contains the exception code, edx is the number of exception parameters (up to 3), r8 contains the address of the exception, r9 is the first exception parameter (if one exists), r10 is the second exception parameter (if one exists), r11 is the third exception parameter (if one exists), and rbp points to a segment in the _KTRAP_FRAME structure (at offset +0x80).

Upon entry of nt!KiExceptionDispatch, the first thing that occurs is the generation of a _KEXCEPTION_FRAME. Whereas the _KTRAP_FRAME was used to store volatile registers, the _KEXCEPTION_FRAME provides a place to save all nonvolatile registers:

nt!KiExceptionDispatch also creates an _EXCEPTION_RECORD structure on the stack. If you've done any error handling in Windows (in either usermode or kernelmode), you'll be familiar with this data structure as it is contained as a child within the _EXCEPTION_POINTERS data structure. We use both of these structures in our example above.

Furthermore, this explains the first part of our mystery, namely, why the _EXCEPTION_RECORD.ExceptionAddress is incorrect. Recall that the _EXCEPTION_RECORD.ExceptionAddress is populated by the 3rd (r8) argument to nt!KiExceptionDispatch. This was passed in from nt!KiBreakpointTrap. This argument is a copy of the  _KTRAP_FRAME.Rip member decremented by one.

To figure out where the _CONTEXT.Rip member is populated, we need to go deeper down the rabbit hole.

nt!KiExceptionDispatch will call into nt!KiDispatchException (yes, the ordering of the words are intentionally flipped) passing in the recently created _EXCEPTION_RECORD and _KEXCEPTION_FRAME:

This function will build a _CONTEXT out of the _KTRAP_FRAME and _KEXCEPTION_FRAME by invoking the helper routine KeContextFromKFrame. After the _CONTEXT is created, a check is made against the _EXCEPTION_RECORD.ExceptionCode (received as an argument from nt!KiExceptionDispatch) for STATUS_BREAKPOINT (0x80000003). If it's true, the _CONTEXT.Rip member will be decremented:

This solves the last part of the mystery and causes the value in _CONTEXT.Rip to be tainted.

The anti-debug trick

Knowing what we know about how Windows handles the different types of int 3s, is it possible to leverage this discrepancy in a useful way? The answer is yes. 

Debuggers display the state of the program at the time of an exception. Since Windows will incorrectly assume that our int 3 exception was generated from the single-byte variant, it is possible to confuse the debugger into reading "extra" memory. We leverage this inconsistency to trip a "guard page" of sorts. 


As we saw in our first example (at the start of the article), when a multi-byte int 3 occurs, the _EXCEPTION_RECORD.ExceptionAddress and _CONTEXT.Rip values will lie in the middle of our multi-byte instruction instead of at the start. This means that the debugger will incorrectly determine that the instruction which threw the software breakpoint begins with the opcode 0x03. Referring to the trusty Intel manual, we can see that this opcode represents a 2-byte add instruction:

From the Intel Instruction Set Reference (Volume 2, Chapter 3, Section 3.2).

What would happen if we positioned our multi-byte int 3 near the end of a page of memory?

When the operating system notifies our attached debugger of the breakpoint exception, the instruction pointer will point to memory that will be misinterpreted as the start of an add (0x03) instruction. This will cause the debugger to disassemble data on the adjacent page (since this instruction is 2 bytes long), and effectively read one byte past our "valid" memory range.

Our trick relies on the fact that Windows, as an optimization, will not commit virtual memory to physical RAM unless it absolutely needs it. That is to say that most memory, especially in usermode, is paged. When memory needs to be made available for use that is not currently in physical RAM, a page fault occurs. To learn more about memory management, check out the following articles on our site: Introduction to IA-32e hardware paging and Exploring Windows virtual memory management.

So, we can detect the memory read on this adjacent page by inspecting the corresponding PTE (Page Table Entry) using the QueryWorkingSetEx API. If the page is resident in our process' working set (e.g. mapped into our process by the debugger), the Valid bit in the _PSAPI_WORKING_SET_EX_BLOCK will be set.

PoC||GTFO

A full example can be found below:

As always, if you have any questions or comments, please feel free to send us a message below. Happy hacking 😎.

Comments

  1. Nice. I especially liked the ' PoC||GTFO' reference.
    Does it do the same behavior with 0xCD 0x01 / int1? 0xF1 is the single byte int1 instruction, and 0xCD 0x01 being the multibyte int1. I've also heard it called 'icebp'.

    ReplyDelete
    Replies
    1. In the case of a single step exception (int 1), IIRC the _CONTEXT.Rip and _EXCEPTION_RECORD.ExceptionAddress is not adjusted at all. This means that it should point to the next instruction in the stream.

      So, I'm guessing, while it won't cause an instruction misalignment, a debugger will probably still read the IP memory and cause a page fault.

      I can check tomorrow to confirm!

      Delete
  2. This has now been fixed in x64dbg: https://i.imgur.com/0tE62wK.gif

    ReplyDelete
    Replies
    1. Oh sure - way to spoil the fun, Duncan! Great response time, though.

      Delete

Post a Comment

Popular posts from this blog

Breaking backwards compatibility: a 5 year old bug deep within Windows

Exploring Windows virtual memory management