Powered By Blogger

Operating in Systems space.

<< Patricians VS Arriviste >> Not the very obvious in Computer science.

Sunday, June 21, 2009

Implementing programmable invocations of GDB awatch rwatch & watch


Problem:

The OS xyz uses, Doug Lea implementation malloc dlmalloc.
A heap abstraction is built on top of dlMalloc.


An 8 byte allocation takes place from the heap, resulting in the following memory layout.


+-----------------------+
|DL-mchunk 8bytes
+-----------------------+<<---malloc returned address
|Usable 8bytes
+-----------------------+
|Heap Poison 0x5a5a5a5a <<---Corruption.
+-----------------------+
|Heap Integrity check
|struct
+-----------------------+


The allocated 8 bytes takes a ride of its life time, goes through Fs code, Buffercache code, sCSI code and then
in the interrupt callback path its seen that the heap poison (0x5a5a5a5a) is corrupted. We work on this corruption because our ass is on the stack of the PSOD.


Constraints:
-The OS is for some reason doesn't like you setting hardware breakpoints. Essentially gdb watch,rwatch and awatch
operations are undefined.
-Its a showstopper bug with very low turn around time. So you dont have all the time in the world to read all of the
code and checkin diffs. its already 7 working days with 4 heads working on it.
-Its a hiesen bug, attaching a debugger changes the timing somehow that the bug isn't reproducible under the debugger.



Solution:

---------
A standard solution would have been placing a breakpoint at malloc, and setting at gdb watchpoint at the poison
bytes to catch the culprit. But .... the great OS has problems that as soon as set the breakpoint and continue
code execution the debug server code itself asserts and crashes. Without setting the watchpoint and just letting the
code run, the bug is no reproducible.


So as seen the debugger here is apparently broken, along with the code. So if you have good understanding of debuggers, you'd know that all that the debugger does from the command prompt can be done programatically, within the being
debugged program, for example breakpoints can be set using good old int 3. So my best bet is to programatically set a watchpoint like. This would avoid the timing issue involved with setting a breakpoint and setting things by hand, and also the debugger is broken and is not allowing to set a watchpoint.

ptr = malloc(8);
watch((uint64_t)(char*)prt + 8, //watch the poison address
4, //4 bytes
WRITE_ACCESS); //for write access



So how to implement the watch(address,nr_bytes,accesstype) ? function.

implementing x64 watch,awatch,rwatch commands

---------------------------------------------
DR0 to DR3 allow you set 4 hardware breakpoints, i.e how awatch is implemented. (read wiki article in reference)

So algo for wathc(address,nr_bytes,accesstype) is
1. Load address into DR0 using ..


2. DR7 controls which of DR0-DR3 breakpoint are enabled.
Bit 0 of DR7 is set to 1, to enable breakpoint at address placed in DR0 in step 1. = 0x1
Bit 16-17 control access-type to memory. 01b is for WRITE_ACCESS 0001b
Bit 18-19 control number of bytes to be monitored 11b is for 4 bytes 0 1 0 1 b = 0xD

So the control code to enable our breakpoints to be placed in DR7 is 0x00000000000D0001
(all those 0's are required because x64 expanded the debug registers to 64bit, and all unused bits must be 0, else you'd get a GPF.)


3. Load control code into DR7 using.


C'code would look like.

Replacing int 1 with handler of int 3
------------------------------------

So I ran the code with my programatically placed mouse traps.

As soon as the culprit tries to writes my monitored bytes (poison bytes), "INT 1" is invoked. I stress the point because "INT 1" because I expected int 3 to be invoked. INT 1 will be or may not be implemented for interactive debugging for your OS. If yes then cool, the OS will wait/freeze to be broken into by the debugger as soon the write access take place. Just attach the debugger and get the backtrace as to which code path tried to eat up the 0x5a5a5a5a poisoned cheese.

But ... If INT1 is not implemented, you can still swiggle around that stuff, by copying the IDT entry for INT 3 into the entry for INT1. (I know I'm sometimes quite awesome with tips)


Even if int 3 is not implemented just write an ISR for int 1, with a loop, MAKE SURE INTERRUPTS ARE ENABLED while you spin in int 1, because the serial debugger cannot break in with the interrupts disabled ;)

.int1
label: nop
jmp label

Yes so we got the culprit on the stack :) Once again computer science saved the day !

Caveats:
--------
For the keen obeserver, who understand allocations and x64 its obvious that many OThER steps were required to nail the bug. But all cannot be explained in a blog. BUT the essense and CRUX of the solution is explained in this blog.
For example, it a race, so many allocations are successful, and so are free, you must not breakin when and valid access takes place when Heap_Free is called at instances which are not corrupt (its easy some more assembly).


Some bare-bones code snippets and references.

---------------------------------------------
static __inline void
load_dr7(uint64 dr7)
{
__asm __volatile("movq %0,%%dr7" : : "r" (dr7));
}


static __inline void
load_dr0(uint64 dr0)
{
__asm __volatile("movq %0,%%dr0" : : "r" (dr0));
}


corrpution_fn() {
char *ptr=NULL;

ptr=malloc(8);
if(!prt)
//do whatever
ASSERT(ptr);

load_dr0((uint64)(ptr+8)); <- Load address to be monitored
load_dr7((uint64)0x00000000000d0001); <- Load DR7 control code

// simualte corruption :)
//memset(ptr,0,12)
}





-http://en.wikipedia.org/wiki/Debug_register
-GNU GDB 6.5 gdb/i386-nat.c
-http://msdn.microsoft.com/en-us/magazine/dd252945.aspx
-Google for blog "Under The Hood - Matt Pietrek" - X64 Article Questions & Clarifications, and look at question asked by Nilesh Padribi - http://blogs.msdn.com/matt_pietrek/archive/2006/04/27/585218.aspx (keeps on changing)
-Intel manuals (though not required for this exercise)

Thursday, April 23, 2009

"Code Templating" C installers for Interupt Service Routines (ISR's)

Any OS would like to install ISR handlers for atleast as few interupts. Skipping all the i386 jargon lets get down to coding it. Rest is arranged as
1. required functionality,
2. the problem
3. the solution and notes.
4. Also attached are links to the actual C files implementing the functionality in my OS.

Needed Functionality
typdef void(*ISR_FNPTR)(int ISRNR,void *isrCtx);
KRET_STATUS install_isr(int ISRNR,ISR_FNPTR fnPtr,void *isrCtx);

So on interrupt number ISRNR I want to invoke, the C function pointed by fnPtr. To this function I want the context isrCtx and the interupt number to be passed.

The Problem.
Just initializing the IDT entry with the fnPtr will not do, because the i386 won't push the ISR number and the context isrCtx on the stack before calling your C handler. The quick dirty and often resorted fix by (you know whom) is to have assembly wrappers for _every_ ISR, and then trampolines into your 'C' code after setting up the stack correctly, not sweet not sweet.

something like
.extern isr1
ISR1.S
push 1
push isrCtx1 ;;you c the point here that this is not Cish.
call isr1
iRet

ISR1.C
void isr1(int isrNr,void *isrCtx) {
}

The problem statement is to avoid the need for writing assembly wrappers for every ISR, that is what ISR code templating does.

Solution.
1. Solution is to have an ISR template code as listed in ISR_TEMPLATE.S
2. Next to install ISR, one actually mem copies the template code block into a kernel allocated data buffer.
3. You patch the copied binary assembly code, to generate assembly instruction equivalent to
push yourIsrNr
push yourCntx
call yourC_FN
4. Below is the rough algo and C and assembly psuedocode. It almost would work :D
5. Refer to attached files to the actual implementation.

hypothetical ISR_TEMPLATE.S for the isr template
.export start
.export end
.export iretoffset
.export start_isr_template

start_isr_template:
push 0xCAFECAFE
isrNRPatchOffset:
push 0xC001D00D
callPatchOffset:
call fn
iretoffset:
IRET
template_end:

end


.getIdt
SIDT;

hypothetical isr_handler_install.c
#define DBG_REL_ASSERT(cond) do { \
ASSERT((cond)); \
if( !(cond) ) \
return FAILURE; \
}while(0)

/* Get the offsets into the template code to be patched */
extern char *start_isr_template,*isrNRPatchOffset,*callPatchOffset,*iretoffset,*template_end ;

/* Installs the ISR in the IDT after, alloc and binary patch */
KRET_STATUS install_isr(int ISRNR,ISR_FNPTR fnPtr,void *isrCtx) {
IDT_ENTRY *idt;
char *isrTrampoline;


DBG_REL_ASSERT(isrNR);

//- Step1 : alloc memory for the code block -//
isrTrampoline = alloc(end-start);
if( NULL != isrTrampoline )
return NO_MEM;

//- Step2: Copy in the code template into the allocated memory -//
memcpy(isrTrampoline,start,end-start)

//- Step3: patch the address of isrCtx -//
ASSERT(*((uint32 *)(isrTrampoline + sizeof_push_Instr))==0xCAFECAFE)
*((uint32 *)(isrTrampoline + sizeof_push_Instr)) = isrCtx;

//- Step4: patch the isrNR -//
ASSERT(*((uint32 *)(isrTrampoline + (isrNRPatchOffset - start_isr_template) + sizeof_push_Instr))==0xC001D00D)
*((uint32 *)(isrTrampoline + (isrNRPatchOffset - start_isr_template) + sizeof_push_Instr)) = isrNR;

//- Step5: patch the address of the call as per an i386 near call -//
*((uint32 *)(isrTrampoline+(callPatchOffset-start_isr_template)+sizeof_call_instr)) =
((isrTrampoline + (iretoffset - start_isr_template)) > (char *) fnptr) ?
((isrTrampoline + (iretoffset - start_isr_template)) - (char *) fnptr):
((char *) fnptr - (isrTrampoline + (iretoffset - start_isr_template)));


//- Step6: Install the newly patched code block in the ISR -//
idt = getIDT();
idt[ISRNR].fnAddr = isrTrampoline;
return SUCCESS;
}



Here is what it install_isr(..) does.
Step1.
Allocates code block for the ISR trampoline (from here the code calls into the indented C fnPtr). Its an assembly code chunk templated from the assembly file ISR_TEMPLATE.S

Step2.
Memcpy copies the templated trampoline code into the allocated trampoline code block. The address of the template code and offsets with the template are made available to the C code via the externs.

Step3 & Step 4.
This template code then is pathced at 2 places,
Given we have offsets into the templated code via externs, we essentailly do fill in the following blanks.
push .... <- address of the context to be passed to the C ISR handler push .... <- isrNR to be passed to the C ISR handler. Step5.
This took me while to get it correct, (3 days to be precise). You want to patch the call instruction such that the patched ISR trampoline now calls your C function.
It can be done in many ways and
CALL [yourhexfnaddresshere] is not one of them.

Technically it can be done, though not in the case of 'as' and 'gcc' as is. The assembler generated binary for the call instruction generates a short call instruction. A short call instruction is special in the sense that the address that is takes is not absolute. A short call instruction as the name suggests is a call to an intra segment address. The way this address is encoded is the difference between the 'calls return address' and 'call's target function address'. You get this correct rest is cake walk C code, as explicated in step 5.

Step6.
With the trampoline (the allocate-pathced code block) ready you set its address into the IDT. From here is calls into your patched C code.


Notes on patching Intra segment near call.
Its exactly what it says it is i.e. a short call instruction with the same code segment. So your memory for the code block has to come from KERNEL_CS. The address for this instruction is coded as the diff of the target call address and the return address. This makes the return from the called function just a addition/subtraction from the eip. Recheck what opcode your assembler generates for the call instruction and the intel manual. Objdump and gcc -g are your friend.

Let 'your' head be with you ( and not somebody else's Open Source, including mine )

www.geocities.com/faraz_irulz/i386lib.zip
don't site stay around my webpage for long. You never know what happens !

Sunday, May 04, 2008

Brent Welch's Talk "My code is better than yours" .. ..

I took up Advanced Storage systems at CMU. Yes the one that is taught by Garth and Greg. The class concluded with a talk by Brent Welch from Panasas. The talk was centered on his experiences around writing Quality good code. I am so ashamed to have slept through half of the class (because I was preparing for a demo to Rob Ross another cool guy from ANL late night). Friends say that I make it quite obvious when I sleep in class (head down and doze, the only saving grace is I do not snore). Hmm to the point, So Brent did talk about a lot of things which I probably did not register as I was asleep, but here are some of the bits and pieces that I registered and I felt were pretty good.

Brent made it quite obvious he was a serious programmer. I mean dead serious. I just could smell it from his talk. I think he is a dude programmer. I have met only a few dude programmers that are real dudes (YOUVRAAJ KELKAR, KIRAN JOSHI and BORISLAV MARINOV). I mean let’s face it everybody else is just about average tapping of theirs ways to glory polluting the already polluted universe of spaghetti code. Btw there is a difference in being a big mouth jibroni ass and talk about everything from Turing machine to WSDL, but a few actually do write code.

Here is what Brent said, which I quite believe in.

1. If you cannot set the tab to (4)spaces (or whatever the convention is for a project)in your code, he would think what kind of dumb ass you are. This is what I personally feel when some dumb ass tries to make a ruckus about HIS programming style and doesn't want to change it in the greater interest of the project.

2. This one is something very new very nice absolutely amazing.

Brent once asked an XYZ programmer who is fond of rewriting code (quite like me) - What is the time required to understand somebody else's code. The programmers answer was, "Time required to rewrite the code?" That is so true, given a smart programmers versified code it’s quite a task to understand his frame of mind that resulted in the code. Quite like an artist’s painting only he fully understands it.

3. Writing quality code is underestimated!

Again, so true every TOM DICK AND HARRY wants to code/pollute. Pollute Pollute Pollute and pollute more. Half of them are just not that passionate - they are just in for a good time. There are basically only 2 types of programmers ones that are passionate and the others that are not. There is not much grading criteria for you if you are not passionate. Hiring people who are not passionate can ruin a team as they bring down the morale of a great team down in more than 1 way. At times when I did have a chance to knock of people who are not passionate about the project - I personally did take delicate care to see that they are out!!!!

4. He believes that "His good is better than everybody else's code"

I buy his argument completely. I usually think the same that my piece of code is better than the rest of the shit floating in SVN/CVS land. How can 2 programmers live in harmony with such an attitude? The point is not that he/me find appreciating somebody else's code too condescending. What it means that the programmer has a lot a respect and responsibility for his code. He would take it personally if somebody passes a unfavorable comment about his code. I think this sense of responsibility and the urge to maintain once repute is what makes quality code deliverable. Its a standard you set for yourself increasing it every time when you learn the best practices of the trade.

Hmm from unfavorable comments I remember how I lambasted one of my colleagues for a frivolous comment he made. Back in old days I did work in Windows User land for Unitrends and this very cool friend of mine worked on Xiotech a kernel mode kick ass project (WTF we all know it was good). He said "Kya be tu kya chavanni ka code likhta user mode me ?" (What code worth 25 cents do you write in user land?). OK …………………… THATS IT what followed was a 1/2 hour tirade as to how dumb fcuck shit piece he, his life and his code was. The comment hit me bad, I know it when I feel it it’s that suffocating feeling that I get in my neck. He could not understand what just happened and was just dazed. ... I wasn't. If I get an opportunity again it will fcuk the shit out of him again for his comment. I mean nobody comments on what I do for a living literally. May be I have grown up now and I won't react the way i did back them, but i'm sure i would react in some way.

My rule is quite simple, your code my friend is the best …. just don't compare it with my code or comment on my code. If you are hell bent on a comparison I simply have 2 word for you "CODE IT !!!" or else talk to my hand. And for the ones who I feel are really better than me (I know its dumb to create a false illusion of being better when you know you are not) I make their praise pretty obvious !o!

P.S no one is a born dude. There is no short cut to being a dude. Except Nikola Tesla everybody follows the power law when acquiring skills.

Saturday, March 15, 2008

Debug This !!

:D this is a strange one. May be a the freakiest ones. Every once in a while I have a deluge of work to be completed in the as usual minuscules of time :(. This time it was and FSCK program for the EXT2 file system. Cool ha !! . Its a dumb programs that tries for fix a corrupted file systems nothing much to say about it. Good experience to write one,nothing great about it not as great as my new idea !

In these high wrought and worked up days I'm often visited by my old dear friend ACIDITY. Its business as usual for me when it comes to severe acidity. Wake up, know the exact place below the ribs where it pains,drink water , and ... throw up. Well something really worse happened today where i did threw up again again but none alleviated the pain :(. So i slept in between these 4 hours 4 times and with FSCK on my mind these are the thoughts I get in my dreams.


1. "May be my liver has gone in an infinite loop producing Bile, i need to just fix that variable and all will be fine". hmm this one is very sad, just shows my attachment to the debugger windbag or gdb doesn't matter.

2. "My intestines are a big disk and the part that is paining is the corrupted block of the volume/file system may be i should attach my self to the debugger and see the values of the corrupted block/intestine and do an clear bl".

The thoughts about FSCK and the pain were so badly amalgamated that I clearly remember waking up at regular intervals and explaining it to myself
Faraz-> "Ext2 fSck and acidity are two different problems. Both have no relation to each other and you cannot insert your body in a debugger and diagnose it. Get over it and try to sleep.!!


Very sad !! This is what life gets like at CMU !! No girls, No flying and no dropping even in your dreams :P

Well to think about wouldn't it be great if we can make a debugger for our doctors ;). (There are were sweet doctors at UPITT here)
->Just attach the patient and do a
->bt (backtrace) for everything on the stack.
No more baap ka naam ?, maa ka naam?, kal kya khaya tha ? kyu khaya tha ?.
->print heartbeat; //to print the hearbeat
->break bile>1000 // hmm hmm not cool i know, this a demo na Baba. Real implementation won't stop the running patient
fix the thing
->step
->step n
->continue.
-> WOW!! i am a doctor now !!!

I think these things do exist but they are just not seen as a debugger. I think its doable in a lifetime and another would be needed for testing it :D

LETS PUT THESE DOC'S OUT OF BUSINESS, I will debug myself using my debugger.

Well after 4 hours of an ordeal and hundreds of call to/fro back home, the day was saved by Pudin Hara. From now on, something which i will try to implement is not stay awake for more than 2 days on a stretch (I have kind of developed it over time and the MAX is 5 days without a 'second' of sleep). On the 5th day the experience is quite worth sharing. Some other time. But the best part of the 5 day run is the experience of hearing/understanding what you yourself say with a lag of some time, and crying/singing voices being poured into your ears out of no where. Missing frames , almost like the MATRiX stuff. (Do not do it !)

Thursday, January 10, 2008

Can we conclude about writing perfect software ? With no bugs !


Lets answer the question "Why cannot we write perfect software ?". We all are periled by this notion of generating digital junk in the form of software code which is always riddled with bugs of one form or the other. I think its important to appreciate the fact that "humans cannot write perfect code". And lets not take my word for it and provide a proof for it

Theorem 1: "Humans cannot write always perfect code"
To prove this ->
  • a. I will explain the "Emperical model of probability".
  • b. Derive some "Results from the emperical model".
  • c. Explain "Converse of -> Results from the emperical model" (converse of conclusion drawn from step b).
  • d. Proof Theorem 1 form step a,b,c. (mostly c)
  • e. Answers to critics for the above reasoning.


a. Emperical model of probability.

Ever wondered why is probability studied ? Is probability an study and analysis of a random processes/experiments ? Study of anything that is random is by defination - futile. For example if you are randomly going to be picked up for security screening at an airport then, any study about ways to avoid the security screening is quite dumb. Also another iniutivie example is "knowing that the probability of a getting a heads in a coin toss, does not help u any in winning any coin toss".

Well probability existed as field of mathematics and is quite successful at it. So something is flawed in the conclusions drawn in above paragraph. To understand the flaw we need to understand the emperical model of probability. Well to conclude that "probabilty is a study of random experiments" it was totally wrong.


Probability is the study of "long term stability of outcomes of an random experiments". And it turns out that this long term stability is quite deterministic and not random. What is long term stability of results ? It means taking a 1000000gazillion experiment of coin tosses and then measuring how many of them resulted in heads will tend close to the result 1/2. I use the words "tends close to the result 1/2 or 0.5". As it turns outs results tend more and more closer to .5 as the number of experiments increases from 100000gazzillion to 100000trazillions so on and so forth.

Good ! so now we know what is probability all about and for the record - it does not help u win any lottery tickets or coin tosses. Sadly but true most of what i was taught about probability had all about winning lottery tickets, card decks and tosses.
( Just proves that explaination of any mathematical model is worthless if its not backed up by an emperical model).


b. Results form the empirical model.
------------------------------------------
The take home point from the above paragraph is.

b.1. Probability is not analysing something random.
b.2. For probability to make "MORE" sense you need and infinitely countable runs of the random experiment.
b.2.1 The conditions in the experiments have to be deterministic and same on all run. (otherwise it makes no sense).

I will use the converse of b.2 and b.2.1 to prove our theorem 1.


c. Converse or Results from the emperical model.
---------------------------------------------------------
b.2. For probability to make "MORE" sense you need and infinitely countable runs of the random experiment.
b.2.1 The conditions in the experiments have to be deterministic and same on all run. (otherwise it makes no sense).

Converse of 2.1:

c.1 If you are getting predicatable (converging results) from the gazzilion runs of random experiments. It means that the conditions under which the experiment are stable and deterministic.

(Well think about it-> its quite inutitive dont bring anything complex into mind). This ends the backdrop required for the proof.


d. Proof Theorem 1 form step a,b,c.
-----------------------------------------
Now lets try to model the emperical model of software development.


Software D.E.V.E.L.O.P.M.E.N.T.
The "development" part in software development is a random experiment. You for example cannot control many aspects of the development process like.

d.1. The algorithmic understanding level of the programmer.
d.2. The understanding of the programming model used to implement an algorithm.
d.3. Sometime solutions are based on random algorithms.
I could go on and on and on

So it means that even gazillion runs of a software development process cannot provide help for a probabilistic measure of software quality. (This is based on C.1. We fail C.1 so we fail B.2 and B.2.1). So we cannot have probablistic measures of software quality because the outcomes are based on random experiments with random experiment settings. So atleast we cannot conclude about the probability of writing bug free code.

Hence proved.




e. Answer to critics for the above reasoning.
-------------------------------------------- ------
Always nice to BLAST AWAY SIMPLE critiques against a proof.

Q. Are u dumb to write such a proof ?
A. I cannot conclude on me being dumb. So i cannot answer :). Its all about what u think. Unfortunately interpretation of all sciences become quite subjective after graduation ;).

Q. Well my "hello world" runs perfectly and so hell with your proof ?
A. You for one reason don't appreciate the number of lines of code you executed to get u r moronic "Hello world" up and running.


Starting from
1. Helloworld.cpp
2. Compiler.cpp Linker.cpp Assembler.cpp
3. Loader.cpp
4. XYZ.cpp
5. Microcode.vhdl
6. Solarflares.god

So u see your helloworld is just a simple pimple on the arse of the universe. Most of the components involved in getting your helloworld up and running have one essential property. The property of having infinately countable test runs starting from compiler.cpp to solarflares.god. So its not quite an arguement.

Please read "Reflections of trusting trust - Ken Thompson (its a 3 page blaster)" To see how you helloworld can fail. That with a perspective on security though.

Saturday, October 20, 2007

Simplicity !! in Software Engineering.

I have always learnt more than what is taught at academic institutes. This is mostly due to the fact I analyse whats being taught in extreme detail. This excruciating analysis usually means that I do not complete the entire syllabus, but that's the way it is. My analysis usually stops when I have understood the very simple fact on which the academic conclusion/lesson was based. As an example I took a sentence from a software engineering class which states
"Software should be simple"
Is it end of the lesson? Has everything been learnt ? No I don't think so!!!
The key questions left to be answered are
1. Whats the definition of simple ?
2. How do you measure simplicity ?
3. What are the parameters to the measuring function ?
3. How do you apply the concept in real life ?
4. Are there any patterns associated with the application of the concept ? (Patterns, patterns, pattern how much do I love to identify them)
Hmm, now if I pose these questions to people who claim to understand "simplicity" I will get 100s and 100s of variations in answers. Which one is correct ?. Thus I tend to conclude that real knowledge is tested when there are no correct answers and right answers are based on circumstances in which the concepts are applied.
OK, enough of the ramble and scramble. Here is what I learnt today at Carnegie Mellon that is worth sharing. Its a very sweet and simple definition of "Simplicity" itself
Simplicity: Its the art of maximizing the amount of work not to be done to do the "thing right".
Now can we answer the questions about simplicity ? I am quite sure that the above definition puts the practice of keeping thing simple in the right perspective ;). The questions posed about simplicity are now well answered, and this is what is truly learnt because, this will eventually helps us in application of the concept.
Cheers,
Faraz.
P.S: I'm dumb at times. I risk stating the very obvious for the smart people. But, hey this surely helps other dumb people like me. Me being dumb helps me constantly evolve into a smarter being.
Food for thought (Courtesy Wikipedia:Simplicity):
"Simplicity means the achievement of maximum effect with minimum means." — Koichi Kawana, architect of botanical gardens
"Things should be made as simple as possible, but not simpler." — Albert Einstein (1879–1955) "Simple things should be simple. Complex things should be possible." — Alan Kay
"You can always recognize truth by its beauty and simplicity." — Richard Feynman (1918–1988) "
Our lives are frittered away by detail; simplify, simplify." — Henry David Thoreau (1817–1862) "Simplicity is the ultimate sophistication." — Leonardo da Vinci (1452–1519)
"If you can't describe it simply, you can't use it simply." — Anon
"Simplicity of character is the natural result of profound thought." — William Hazlitt

Wednesday, May 16, 2007

StrStr as an Interview question.

StrStr happens to be the favorite question for some interviewers. I have a senior colleague who is especially fond of this question. He had a hard time getting people from X country answer that correctly. So he decided, to gauge the difficulty of the strstr question by using my answer as a benchmark (Don’t know how wise this decision was?).

He started out by explaining the question to me. And then i started thinking about this. I knew about this question but never had the patience to find an answer. So i decided to think :), n2 algorithms are BAD so I did not even discuss the search the needle in the haystack solution.

Eventually after he putting almost all the words into my mouth I/we got the variant of the KMP strstr algorithm. And then to test if I really understood the solution (this is imp now because I did not design it :)) he just asked me to build another failure table as used in the KMP algorithm which takes the number of matches before a character failure and gives the skip count. And then I got that while waiting for my bus.

Conclusions are.

1. "Efficient" StrStr is not a simple algo to design.

2. There is much to think about it.

3. Once you get it, it easy. BUT... it can always be further optimized.

4. Its not a good interview question for a telephonic interviews. The reason being on the extremely smarts can do w/o hints, i think the person being interviewed should be given hints. Hints are not conveyed effectively over the phone. :)

5. It will fairly gauge the thinking abilities of the person being interviewed.

Tips for the "being interviewed".

1. If you know the answer be honest say that u already know the answer.

2. Don’t get overwhelmed, the direction to start working is not that intuitive. The starting is important.

3. As for all thinking questions you have to convey to the interviewer the fact that u r actually thinking. So think aloud getting the answer doesn't matter as long as you are able to think.

Followers