The API hashing technique is used so frequently in malware that it has already received its own MITRE number. Meanwhile, all we need to do is find a known subset of strings within a known set. Ideally, the stored data—unlike the familiar four-byte hashes—should look “decent,” meaning they should be small numbers.
The first idea that came to mind was to take the list of all glibc exports and build a minimal perfect hash - that is, a hash function that maps n values to the numbers from 0 to n-1. One of the simplest and efficient approaches is CHD (Compress-Hash-Distribute). First, we compress the keys into buckets using one hash $h_1$, and then we choose a function $h_2$ and an offset $d$ such that $(h_2 + d) \mod n$ produces unique values, roughly like this:
qsort(buckets, m, sizeof(struct bucket), cmp_bucket);
int temp_used[32];
int placed;
for (uint32_t i = 0; i < m; i++) {
struct bucket *bk = &buckets[i];
if (bk->size < 1)
continue;
placed = 0;
for (uint32_t d = 0; d < n; d++) {
int ok = 1;
for (uint32_t j = 0; j < bk->size; j++) {
uint32_t k = bk->keys[j];
uint32_t h = (hashes[k] + d) % n;
if (used[h] != 0) {
ok = 0;
break;
}
for (int t = 0; t < j; t++)
if (temp_used[t] == h) {
ok = 0;
break;
}
temp_used[j] = h;
}
if (ok) {
disp[bk->index] = d;
for (uint32_t j = 0; j < bk->size; j++)
used[temp_used[j]]++;
placed = 1;
break;
}
}
if (placed == 0)
break;
}
The resulting values can be efficiently compressed using Elias-Fano encoding. But we don’t need hashes for the entire dictionary - only for a small subset of it - and we can store only those buckets that contain the values we care about. The first hash essentially becomes a Bloom filter (with k = 1), and since the number of keys after filtering is greatly reduced, we can bruteforce $h_2$ such that we get a perfect hash without any offsets (for entire dictionary bruteforce is unfeasible, that's why CHD uses displacements). Like this:
$ gcc nspf.c -O3 -o nspf # source on github $ ./nspf Read 2727 keys, 681 buckets, 16 needed 97: open, __libc_ns_makecanon, closelog, 151: fork, __assert_fail, __nss_configure_lookup, __res_context_query, pkey_get, 216: mmap, __cyg_profile_func_enter, 256: read, _nss_files_parse_servent, pthread_rwlock_tryrdlock, stderr, 287: wait, __libc_dn_expand, __pthread_mutex_lock, _obstack_newchunk, _thread_db_pthread_nextevent, svc_run, 291: free, _thread_db_pthread_cancelhandling, pthread_attr_getaffinity_np, 318: accept, _nss_files_gethostbyname_r, authunix_create, gettimeofday, pthread_attr_getschedpolicy, sgetsgent_r, 374: connect, __isoc99_vfscanf, _nss_files_gethostbyname3_r, getcwd, 424: bind, sched_setaffinity, 441: socket, __call_tls_dtors, __fxstat, _IO_feof, _nss_files_setsgent, pthread_mutex_clocklock, 457: dup2, __ns_name_ntop, _IO_do_write, argz_add, lgetxattr, mq_open, strfromf32, 566: write, __res_nquerydomain, isastream, wcstof32, 576: listen, timer_getoverrun, vscanf, wcsstr, 585: malloc, __isascii_l, __res_mailok, 615: close, __wcsncat_chk, locs, xdr_int16_t, 626: execve, __getdelim, endrpcent, fstatvfs64, iscntrl_l, ispunct_l, moncontrol, setpwent, 71 keys in 16 buckets Found mod = 218 after 717 attempts Filter: 97, 54, 65, 40, 31, 4, 27, 56, 50, 17, 16, 109, 10, 9, 30, 11, Unpacking... 97, 151, 216, 256, 287, 291, 318, 374, 424, 441, 457, 566, 576, 585, 615, 626, Checking... 129 accept 152 bind 37 close 15 connect 32 dup2 155 execve 93 fork 121 free 94 listen 159 malloc 172 mmap 117 open 9 read 160 socket 62 wait 26 write h1 = 726f3963, h2 = 042e7e3f
16 bytes of auxiliary data and 16 values for the functions we need - both stored as single bytes - with an acceptable probability of collisions. And the hash functions themselves, along with their values, are random.
No comments:
Post a Comment