CLE266 Reverse Engineering

For those interested, here are some details on the steps I took to reverse engineer the CLE266 source code (http://www.ivor.it/cle266) in just a week.

0. Tools

You'll need a decent disassembler to work with. I use IDA-Pro (http://www.datarescue.com) which is the best disassembler I've ever seen. If you want to do any reverse engineering or code analysis I strongly recommend you get a copy, get an eval, try it out.

1. Familiarisation

Firstly I needed to get a feel for the code, and work out what was involved. This was a simple matter of disassembling the library and browsing through it a few times.

Doing so it was possible to see the number and length of functions in the library. This made it clear that the library was a fairly thin API layer onto the underlying hardware. There were no particularly complicated functions and most appeared to simply be moving memory from one place to another.

2. Relocation

This stage is no longer necessary with later versions of IDA Pro

Ok this was the least rewarding part of the process. There may be a better way to do this, but I didn't know it at the time, if you know what it is, tell me! :-)

The library is shared object (.so) file which contains address relocation information. In the assembly there are jumps/addresses that contain null values which result in dissasembly such as:

            push ds:0
loc_2D70: ; Call Procedure
call loc_2D70+1
mov ds:0, 0  

Calling objdump -R libddmpeg.so shows us the relocation table for the library in which we can see:

00002d50 R_386_PC32        setbuffer
00002d71 R_386_PC32        fclose
00002d97 R_386_32          gBuf

So we can select the "call loc_2D70+1" and label that "call fclose".

The same applies to the data values "ds:0", again objdump -R libddmpeg.so shows us:

00002d6c R_386_32          gFlog
00002d77 R_386_32          gFlog

So we can now fix the assembly to be:

push    gFlog
call fclose
mov gFlog,0

And the operation becomes clear...

3. Rewriting

The next stage is to recode the assembly into C. Firstly function calls are identified in the assembly and a pseudo C file is written. So let's take a look at one of the nice simple functions:

                public MPGCloseDebugFile
MPGCloseDebugFile proc near
                push    ebp
                mov     ebp, esp
                sub     esp, 8          ; Integer Subtraction
                cmp     gFlog, 0        ; Compare Two Operands
                jz      short loc_2D7F  ; Jump if Zero (ZF=1)
                sub     esp, 0Ch        ; Integer Subtraction
                push    gFlog

loc_2D70:                               ; Call Procedure
                call    fclose
                mov     gFlog, 0

loc_2D7F:                               ; CODE XREF: MPGCloseDebugFile+D^Xj
                mov     esp, ebp
                pop     ebp
                retn                    ; Return Near from Procedure
MPGCloseDebugFile endp 

Which becomes:

MPGCloseDebugFile()
{
    cmp     gFlog, 0        ; Compare Two Operands
    jz      short loc_2D7F  ; Jump if Zero (ZF=1)
    sub     esp, 0Ch        ; Integer Subtraction
    push    gFlog
    call    fclose
    mov     gFlog, 0
loc_2D7F:

}

Now we can look at the function and easily see what the intention is, and turn it into:

void MPGCloseDebugFile()
{
    if (gFlog)
    {
        fclose(gFlog);
        gFlog = 0;
    }
}

The remainder of the code is then processed in the same way. This now gives us something half decent to work with. Although at the moment complex branches have been left in the C source so we still have stuff like:

if (ebx <=0)
goto loc_36fd
goto loc_373b

Working through the code I add in "// CHECK" comments wherever I've got any doubts about the code.

4. Identify parameters

Luckily the library from VIA comes with pretty comprehensive header files with plenty of structures defined. The next step was getting the parameters to the functions defined correctly and then by tracing the use of the parameters being able to determine the data types for all internal variables.

Also some of the logic could be detemined by decoding binary flag fields. For example:

push    gVIAGraphicInfo
push    805476C3h
push    fVideo
call    ioctl

Can now be decoded into the rather more readable:-

ioctl(fVideo,
      _IOR('v', //118 192+3, VIAGRAPHICINFO),//0x805476C3,
      &gVIAGraphicInfo )

5. Analyse flow

Now's the time to start knocking the code into shape. With IDA-Pro you can generate flow process flow charts for the more complicated routines that need structuring. I find it easiest to print out the flows on A3 (or multiple A3) pages, lay them out on the floor then follow them through.

Flow diagram

Here's an example, this is the chart for VIADisplayControl, as you can see I've been doodling in the bottom right corner what I think the C nesting is.

 

6. Compile

Ok, one of the easiest steps now. Start compiling and correcting/checking the code where variables need defining, or scope re-organising. I started by "#ifdef 0"-ing the entire code and then adding each procedure in one at a time to work on.

7. Verify

Verifying the code function by function, for example, to make sure pointer arithmetic looks sensible is the next step. Where possible comparing the compiled assembly code for each function with the disassembled library code, allows me to know that my code is very close to the original source code, and I re-order 'if' statements as necessary.

8. Check

Check the code make sense now, and that the right variables are defined static. At this point some of the pointer arithmetic got fixed and tidied up. For example:

mov     eax, lpMPGDevice
mov     eax, [eax+84h]
shr     eax, 3          ; Shift Logical Right
mov     edx, lpMPEGMMIO
mov     [edx+2Ch], eax
which in step 3 was temporarily coded to:
lpMPEGMMIO->0x2C = lpMPGDevice->0x84 >> 3; //CHECK

then in step 6 to:

*(unsigned long*)(lpMPEGMMIO+0x2c) = *(unsigned long*)(lpMPEGDevice+0x84)>>3;

now becomes:

*(unsigned long*)(lpMPEGMMIO+0x2c) = lpMPGDevice->dwMPEGYPhysicalAddr[1] >> 3; 

Where the casting was tidied and the PhysicalAddr array subscripts filled in.

9. And fly

Finally, start running the code.
Boom, machine crash.
Fix bugs.
Boom.
A few bugs were spotted and fixed....
and then a garbled picture...
a few more bugs spotted...
then a picture but didn't update until the mouse moved... a few more bugs... then....

PING. Moving video!

10. Thanks

Many thanks to Hugo Mills and Michiel van Noort for spotting my typo's in the source code.

 

Regards,

Ivor.


[www.ivor.it | www.ivor.org | www.difo.com ]