Friday, February 20, 2026

Not so perfect hash

The API hashing technique is used so frequently in malware that it has already received its own MITRE number. Meanwhile, all we need to do is find a known subset of strings within a known set. Ideally, the stored data—unlike the familiar four-byte hashes—should look “decent,” meaning they should be small numbers.

The first idea that came to mind was to take the list of all glibc exports and build a minimal perfect hash - that is, a hash function that maps n values to the numbers from 0 to n-1. One of the simplest and efficient approaches is CHD (Compress-Hash-Distribute). First, we compress the keys into buckets using one hash $h_1$, and then we choose a function $h_2$ and an offset $d$ such that $(h_2 + d) \mod n$ produces unique values, roughly like this: