Loading [MathJax]/extensions/tex2jax.js

Monday, April 7, 2025

Disassembly, bitmasks and boolean logic

A long time ago, I read an article by Z0mbie about disassembling and bit masks. Since many instruction encodings have regularities, instead of looking them up in a table, they can be replaced with a simple logical formula. For example, before disassembling the instruction code, we need to check the segment register prefixes (this refers to 32-bit code). Their codes are 26, 2E, 36, 3E, 64, 65. Z0mbie pointed out that six comparisons can be replaced with two comparisons using a bit mask:

    and     al, 11111110    ; 64/65
    cmp     al, 01100100
    je      __prefix_seg
    ...
    and     al, 11100111b   ; 26/2E/36/3E
    cmp     al, 00100110b
    je      __prefix_seg

Is it possible to completely get rid of the tables and replace them with a logical formula?

First, let's generate a truth table for the flags, with eight inputs and one output, and run it through the Espresso optimization program:

#include <stdio.h>

int main(int argc, char **argv)
{
        printf(".i 8\n.o 1\n");
        for (unsigned i = 0; i < 256; i++) {
                for (int j = 7; j >= 0; j--)
                        printf("%d", ((i >> j) & 1) ? 1 : 0);
                printf("|%d\n", (i == 0x26 || i == 0x2e || i == 0x36 ||
                        i == 0x3e || i == 0x64 || i == 0x65) ? 1 : 0);
        }
}
/*
$ gcc m0.c -o m0
$ ./m0 | ./espresso
.i 8
.o 1
.p 2
0110010- 1
001--110 1
.e
*/

Let's convert the resulting masks into hex

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
        char s[32];
        int f = 0;
        while (gets(s)) {
                if (s[0] != '-' && s[0] != '1' && s[0] != '0')
                        continue;
                if (f == 0)
                        f = 1;
                else
                        printf(" ||\n");
                unsigned m = 0, v = 0;
                for (int i = 0; i < 8; i++) {
                        m <<= 1;
                        v <<= 1;
                        if (s[i] != '-') {
                                m |= 1;
                                v |= (s[i] - '0') & 1;
                        }
                }
                printf("(i & 0x%02x) == 0x%02x", m, v);
        }
        puts("");
}
/*
./m0 | ./espresso | ./m2
(i & 0xfe) == 0x64 ||
(i & 0xe7) == 0x26
*/

Now, let's return to replacing the tables with formulas and do the same for the MODRM flag. I took the table from my LDE YAD.


/* MAIN TABLE */
opcode_t main_table[512] = {
/* 00 */
{ "ADD",        Eb,Gb,__ },
{ "ADD",        Ev,Gv,__ },
{ "ADD",        Gb,Eb,__ },
{ "ADD",        Gv,Ev,__ },
{ "ADD",        AL,Ib,__ },
// ...
int arg(uint32_t type)
{
        // ...
        switch (a) {
                case OP_M:

                case OP_R: case OP_PR: case OP_VR:
                case OP_C: case OP_D: case OP_G:
                case OP_P: case OP_S: case OP_V:.
                case OP_E: case OP_Q: case OP_W:
                        flags |= C_MODRM; aa++;
                        break;
        // ...

int main(int argc, char **argv)
{
        printf(".i 8\n.o 1\n");
        for (int i = 0; i < 256; i++) {
                unsigned f = arg(main_table[i].op1) |
                        arg(main_table[i].op2) | arg(main_table[i].op3);
                for (int j = 7; j >= 0; j--)
                        printf("%d", ((i >> j) & 1) ? 1 : 0);
                printf("|%d\n", f & C_MODRM ? 1 : 0);
        }
}
$ ./m1 | ./espresso | ./m2
(i & 0xba) == 0x80 ||
(i & 0xf8) == 0xd8 ||
(i & 0xbc) == 0x84 ||
(i & 0xbd) == 0x29 ||
(i & 0xbe) == 0x22 ||
(i & 0xc4) == 0x00 ||
(i & 0xf1) == 0x80 ||
(i & 0xf2) == 0x80 ||
(i & 0xf4) == 0xd0 ||
(i & 0x74) == 0x00

And that's it, what were you expecting?

No comments:

Post a Comment