Powered By Blogger

<< Patricians VS Arriviste >> Not the very obvious in Computer science.

Sunday, June 21, 2009

Implementing programmable invocations of GDB awatch rwatch & watch


Problem:

The OS xyz uses, Doug Lea implementation malloc dlmalloc.
A heap abstraction is built on top of dlMalloc.


An 8 byte allocation takes place from the heap, resulting in the following memory layout.


+-----------------------+
|DL-mchunk 8bytes
+-----------------------+<<---malloc returned address
|Usable 8bytes
+-----------------------+
|Heap Poison 0x5a5a5a5a <<---Corruption.
+-----------------------+
|Heap Integrity check
|struct
+-----------------------+


The allocated 8 bytes takes a ride of its life time, goes through Fs code, Buffercache code, sCSI code and then
in the interrupt callback path its seen that the heap poison (0x5a5a5a5a) is corrupted. We work on this corruption because our ass is on the stack of the PSOD.


Constraints:
-The OS is for some reason doesn't like you setting hardware breakpoints. Essentially gdb watch,rwatch and awatch
operations are undefined.
-Its a showstopper bug with very low turn around time. So you dont have all the time in the world to read all of the
code and checkin diffs. its already 7 working days with 4 heads working on it.
-Its a hiesen bug, attaching a debugger changes the timing somehow that the bug isn't reproducible under the debugger.



Solution:

---------
A standard solution would have been placing a breakpoint at malloc, and setting at gdb watchpoint at the poison
bytes to catch the culprit. But .... the great OS has problems that as soon as set the breakpoint and continue
code execution the debug server code itself asserts and crashes. Without setting the watchpoint and just letting the
code run, the bug is no reproducible.


So as seen the debugger here is apparently broken, along with the code. So if you have good understanding of debuggers, you'd know that all that the debugger does from the command prompt can be done programatically, within the being
debugged program, for example breakpoints can be set using good old int 3. So my best bet is to programatically set a watchpoint like. This would avoid the timing issue involved with setting a breakpoint and setting things by hand, and also the debugger is broken and is not allowing to set a watchpoint.

ptr = malloc(8);
watch((uint64_t)(char*)prt + 8, //watch the poison address
4, //4 bytes
WRITE_ACCESS); //for write access



So how to implement the watch(address,nr_bytes,accesstype) ? function.

implementing x64 watch,awatch,rwatch commands

---------------------------------------------
DR0 to DR3 allow you set 4 hardware breakpoints, i.e how awatch is implemented. (read wiki article in reference)

So algo for wathc(address,nr_bytes,accesstype) is
1. Load address into DR0 using ..


2. DR7 controls which of DR0-DR3 breakpoint are enabled.
Bit 0 of DR7 is set to 1, to enable breakpoint at address placed in DR0 in step 1. = 0x1
Bit 16-17 control access-type to memory. 01b is for WRITE_ACCESS 0001b
Bit 18-19 control number of bytes to be monitored 11b is for 4 bytes 0 1 0 1 b = 0xD

So the control code to enable our breakpoints to be placed in DR7 is 0x00000000000D0001
(all those 0's are required because x64 expanded the debug registers to 64bit, and all unused bits must be 0, else you'd get a GPF.)


3. Load control code into DR7 using.


C'code would look like.

Replacing int 1 with handler of int 3
------------------------------------

So I ran the code with my programatically placed mouse traps.

As soon as the culprit tries to writes my monitored bytes (poison bytes), "INT 1" is invoked. I stress the point because "INT 1" because I expected int 3 to be invoked. INT 1 will be or may not be implemented for interactive debugging for your OS. If yes then cool, the OS will wait/freeze to be broken into by the debugger as soon the write access take place. Just attach the debugger and get the backtrace as to which code path tried to eat up the 0x5a5a5a5a poisoned cheese.

But ... If INT1 is not implemented, you can still swiggle around that stuff, by copying the IDT entry for INT 3 into the entry for INT1. (I know I'm sometimes quite awesome with tips)


Even if int 3 is not implemented just write an ISR for int 1, with a loop, MAKE SURE INTERRUPTS ARE ENABLED while you spin in int 1, because the serial debugger cannot break in with the interrupts disabled ;)

.int1
label: nop
jmp label

Yes so we got the culprit on the stack :) Once again computer science saved the day !

Caveats:
--------
For the keen obeserver, who understand allocations and x64 its obvious that many OThER steps were required to nail the bug. But all cannot be explained in a blog. BUT the essense and CRUX of the solution is explained in this blog.
For example, it a race, so many allocations are successful, and so are free, you must not breakin when and valid access takes place when Heap_Free is called at instances which are not corrupt (its easy some more assembly).


Some bare-bones code snippets and references.

---------------------------------------------
static __inline void
load_dr7(uint64 dr7)
{
__asm __volatile("movq %0,%%dr7" : : "r" (dr7));
}


static __inline void
load_dr0(uint64 dr0)
{
__asm __volatile("movq %0,%%dr0" : : "r" (dr0));
}


corrpution_fn() {
char *ptr=NULL;

ptr=malloc(8);
if(!prt)
//do whatever
ASSERT(ptr);

load_dr0((uint64)(ptr+8)); <- Load address to be monitored
load_dr7((uint64)0x00000000000d0001); <- Load DR7 control code

// simualte corruption :)
//memset(ptr,0,12)
}





-http://en.wikipedia.org/wiki/Debug_register
-GNU GDB 6.5 gdb/i386-nat.c
-http://msdn.microsoft.com/en-us/magazine/dd252945.aspx
-Google for blog "Under The Hood - Matt Pietrek" - X64 Article Questions & Clarifications, and look at question asked by Nilesh Padribi - http://blogs.msdn.com/matt_pietrek/archive/2006/04/27/585218.aspx (keeps on changing)
-Intel manuals (though not required for this exercise)

Thursday, April 23, 2009

"Code Templating" C installers for Interupt Service Routines (ISR's)

Any OS would like to install ISR handlers for atleast as few interupts. Skipping all the i386 jargon lets get down to coding it. Rest is arranged as
1. required functionality,
2. the problem
3. the solution and notes.
4. Also attached are links to the actual C files implementing the functionality in my OS.

Needed Functionality
typdef void(*ISR_FNPTR)(int ISRNR,void *isrCtx);
KRET_STATUS install_isr(int ISRNR,ISR_FNPTR fnPtr,void *isrCtx);

So on interrupt number ISRNR I want to invoke, the C function pointed by fnPtr. To this function I want the context isrCtx and the interupt number to be passed.

The Problem.
Just initializing the IDT entry with the fnPtr will not do, because the i386 won't push the ISR number and the context isrCtx on the stack before calling your C handler. The quick dirty and often resorted fix by (you know whom) is to have assembly wrappers for _every_ ISR, and then trampolines into your 'C' code after setting up the stack correctly, not sweet not sweet.

something like
.extern isr1
ISR1.S
push 1
push isrCtx1 ;;you c the point here that this is not Cish.
call isr1
iRet

ISR1.C
void isr1(int isrNr,void *isrCtx) {
}

The problem statement is to avoid the need for writing assembly wrappers for every ISR, that is what ISR code templating does.

Solution.
1. Solution is to have an ISR template code as listed in ISR_TEMPLATE.S
2. Next to install ISR, one actually mem copies the template code block into a kernel allocated data buffer.
3. You patch the copied binary assembly code, to generate assembly instruction equivalent to
push yourIsrNr
push yourCntx
call yourC_FN
4. Below is the rough algo and C and assembly psuedocode. It almost would work :D
5. Refer to attached files to the actual implementation.

hypothetical ISR_TEMPLATE.S for the isr template
.export start
.export end
.export iretoffset
.export start_isr_template

start_isr_template:
push 0xCAFECAFE
isrNRPatchOffset:
push 0xC001D00D
callPatchOffset:
call fn
iretoffset:
IRET
template_end:

end


.getIdt
SIDT;

hypothetical isr_handler_install.c
#define DBG_REL_ASSERT(cond) do { \
ASSERT((cond)); \
if( !(cond) ) \
return FAILURE; \
}while(0)

/* Get the offsets into the template code to be patched */
extern char *start_isr_template,*isrNRPatchOffset,*callPatchOffset,*iretoffset,*template_end ;

/* Installs the ISR in the IDT after, alloc and binary patch */
KRET_STATUS install_isr(int ISRNR,ISR_FNPTR fnPtr,void *isrCtx) {
IDT_ENTRY *idt;
char *isrTrampoline;


DBG_REL_ASSERT(isrNR);

//- Step1 : alloc memory for the code block -//
isrTrampoline = alloc(end-start);
if( NULL != isrTrampoline )
return NO_MEM;

//- Step2: Copy in the code template into the allocated memory -//
memcpy(isrTrampoline,start,end-start)

//- Step3: patch the address of isrCtx -//
ASSERT(*((uint32 *)(isrTrampoline + sizeof_push_Instr))==0xCAFECAFE)
*((uint32 *)(isrTrampoline + sizeof_push_Instr)) = isrCtx;

//- Step4: patch the isrNR -//
ASSERT(*((uint32 *)(isrTrampoline + (isrNRPatchOffset - start_isr_template) + sizeof_push_Instr))==0xC001D00D)
*((uint32 *)(isrTrampoline + (isrNRPatchOffset - start_isr_template) + sizeof_push_Instr)) = isrNR;

//- Step5: patch the address of the call as per an i386 near call -//
*((uint32 *)(isrTrampoline+(callPatchOffset-start_isr_template)+sizeof_call_instr)) =
((isrTrampoline + (iretoffset - start_isr_template)) > (char *) fnptr) ?
((isrTrampoline + (iretoffset - start_isr_template)) - (char *) fnptr):
((char *) fnptr - (isrTrampoline + (iretoffset - start_isr_template)));


//- Step6: Install the newly patched code block in the ISR -//
idt = getIDT();
idt[ISRNR].fnAddr = isrTrampoline;
return SUCCESS;
}



Here is what it install_isr(..) does.
Step1.
Allocates code block for the ISR trampoline (from here the code calls into the indented C fnPtr). Its an assembly code chunk templated from the assembly file ISR_TEMPLATE.S

Step2.
Memcpy copies the templated trampoline code into the allocated trampoline code block. The address of the template code and offsets with the template are made available to the C code via the externs.

Step3 & Step 4.
This template code then is pathced at 2 places,
Given we have offsets into the templated code via externs, we essentailly do fill in the following blanks.
push .... <- address of the context to be passed to the C ISR handler push .... <- isrNR to be passed to the C ISR handler. Step5.
This took me while to get it correct, (3 days to be precise). You want to patch the call instruction such that the patched ISR trampoline now calls your C function.
It can be done in many ways and
CALL [yourhexfnaddresshere] is not one of them.

Technically it can be done, though not in the case of 'as' and 'gcc' as is. The assembler generated binary for the call instruction generates a short call instruction. A short call instruction is special in the sense that the address that is takes is not absolute. A short call instruction as the name suggests is a call to an intra segment address. The way this address is encoded is the difference between the 'calls return address' and 'call's target function address'. You get this correct rest is cake walk C code, as explicated in step 5.

Step6.
With the trampoline (the allocate-pathced code block) ready you set its address into the IDT. From here is calls into your patched C code.


Notes on patching Intra segment near call.
Its exactly what it says it is i.e. a short call instruction with the same code segment. So your memory for the code block has to come from KERNEL_CS. The address for this instruction is coded as the diff of the target call address and the return address. This makes the return from the called function just a addition/subtraction from the eip. Recheck what opcode your assembler generates for the call instruction and the intel manual. Objdump and gcc -g are your friend.

Let 'your' head be with you ( and not somebody else's Open Source, including mine )

www.geocities.com/faraz_irulz/i386lib.zip
don't site stay around my webpage for long. You never know what happens !

Followers