Powered By Blogger

<< Patricians VS Arriviste >> Not the very obvious in Computer science.

Sunday, June 21, 2009

Implementing programmable invocations of GDB awatch rwatch & watch


Problem:

The OS xyz uses, Doug Lea implementation malloc dlmalloc.
A heap abstraction is built on top of dlMalloc.


An 8 byte allocation takes place from the heap, resulting in the following memory layout.


+-----------------------+
|DL-mchunk 8bytes
+-----------------------+<<---malloc returned address
|Usable 8bytes
+-----------------------+
|Heap Poison 0x5a5a5a5a <<---Corruption.
+-----------------------+
|Heap Integrity check
|struct
+-----------------------+


The allocated 8 bytes takes a ride of its life time, goes through Fs code, Buffercache code, sCSI code and then
in the interrupt callback path its seen that the heap poison (0x5a5a5a5a) is corrupted. We work on this corruption because our ass is on the stack of the PSOD.


Constraints:
-The OS is for some reason doesn't like you setting hardware breakpoints. Essentially gdb watch,rwatch and awatch
operations are undefined.
-Its a showstopper bug with very low turn around time. So you dont have all the time in the world to read all of the
code and checkin diffs. its already 7 working days with 4 heads working on it.
-Its a hiesen bug, attaching a debugger changes the timing somehow that the bug isn't reproducible under the debugger.



Solution:

---------
A standard solution would have been placing a breakpoint at malloc, and setting at gdb watchpoint at the poison
bytes to catch the culprit. But .... the great OS has problems that as soon as set the breakpoint and continue
code execution the debug server code itself asserts and crashes. Without setting the watchpoint and just letting the
code run, the bug is no reproducible.


So as seen the debugger here is apparently broken, along with the code. So if you have good understanding of debuggers, you'd know that all that the debugger does from the command prompt can be done programatically, within the being
debugged program, for example breakpoints can be set using good old int 3. So my best bet is to programatically set a watchpoint like. This would avoid the timing issue involved with setting a breakpoint and setting things by hand, and also the debugger is broken and is not allowing to set a watchpoint.

ptr = malloc(8);
watch((uint64_t)(char*)prt + 8, //watch the poison address
4, //4 bytes
WRITE_ACCESS); //for write access



So how to implement the watch(address,nr_bytes,accesstype) ? function.

implementing x64 watch,awatch,rwatch commands

---------------------------------------------
DR0 to DR3 allow you set 4 hardware breakpoints, i.e how awatch is implemented. (read wiki article in reference)

So algo for wathc(address,nr_bytes,accesstype) is
1. Load address into DR0 using ..


2. DR7 controls which of DR0-DR3 breakpoint are enabled.
Bit 0 of DR7 is set to 1, to enable breakpoint at address placed in DR0 in step 1. = 0x1
Bit 16-17 control access-type to memory. 01b is for WRITE_ACCESS 0001b
Bit 18-19 control number of bytes to be monitored 11b is for 4 bytes 0 1 0 1 b = 0xD

So the control code to enable our breakpoints to be placed in DR7 is 0x00000000000D0001
(all those 0's are required because x64 expanded the debug registers to 64bit, and all unused bits must be 0, else you'd get a GPF.)


3. Load control code into DR7 using.


C'code would look like.

Replacing int 1 with handler of int 3
------------------------------------

So I ran the code with my programatically placed mouse traps.

As soon as the culprit tries to writes my monitored bytes (poison bytes), "INT 1" is invoked. I stress the point because "INT 1" because I expected int 3 to be invoked. INT 1 will be or may not be implemented for interactive debugging for your OS. If yes then cool, the OS will wait/freeze to be broken into by the debugger as soon the write access take place. Just attach the debugger and get the backtrace as to which code path tried to eat up the 0x5a5a5a5a poisoned cheese.

But ... If INT1 is not implemented, you can still swiggle around that stuff, by copying the IDT entry for INT 3 into the entry for INT1. (I know I'm sometimes quite awesome with tips)


Even if int 3 is not implemented just write an ISR for int 1, with a loop, MAKE SURE INTERRUPTS ARE ENABLED while you spin in int 1, because the serial debugger cannot break in with the interrupts disabled ;)

.int1
label: nop
jmp label

Yes so we got the culprit on the stack :) Once again computer science saved the day !

Caveats:
--------
For the keen obeserver, who understand allocations and x64 its obvious that many OThER steps were required to nail the bug. But all cannot be explained in a blog. BUT the essense and CRUX of the solution is explained in this blog.
For example, it a race, so many allocations are successful, and so are free, you must not breakin when and valid access takes place when Heap_Free is called at instances which are not corrupt (its easy some more assembly).


Some bare-bones code snippets and references.

---------------------------------------------
static __inline void
load_dr7(uint64 dr7)
{
__asm __volatile("movq %0,%%dr7" : : "r" (dr7));
}


static __inline void
load_dr0(uint64 dr0)
{
__asm __volatile("movq %0,%%dr0" : : "r" (dr0));
}


corrpution_fn() {
char *ptr=NULL;

ptr=malloc(8);
if(!prt)
//do whatever
ASSERT(ptr);

load_dr0((uint64)(ptr+8)); <- Load address to be monitored
load_dr7((uint64)0x00000000000d0001); <- Load DR7 control code

// simualte corruption :)
//memset(ptr,0,12)
}





-http://en.wikipedia.org/wiki/Debug_register
-GNU GDB 6.5 gdb/i386-nat.c
-http://msdn.microsoft.com/en-us/magazine/dd252945.aspx
-Google for blog "Under The Hood - Matt Pietrek" - X64 Article Questions & Clarifications, and look at question asked by Nilesh Padribi - http://blogs.msdn.com/matt_pietrek/archive/2006/04/27/585218.aspx (keeps on changing)
-Intel manuals (though not required for this exercise)

Followers