Background / Explanation of What I'm Trying to Accomplish
I'm currently working on a little malware analysis project and am trying to implement a string decryptor that I wrote using Unicorn. In order to condense things and make the code easier to review, I made a smaller example below from my larger codebase.
What I'm doing is extracting snippets of x86 that represent small string decryption routines. There are a series of mov instructions that are eventually xor'd resulting in a plaintext string. I've commented out what string values should result in. In the following example, the uncommented X86_CODE64 instructions are emulated but only result in hpe.com
when I read from the stack address. (Hint: To view output, run strings on asdf.txt
) I would expect to see apple.com
and hpe.com
Question
Based on the code below, is there something I'm doing incorrectly / not doing at all that would result in the following code snippets to not decrypt the strings appropriately?
Disclaimer: This is my first time using Unicorn, so if I'm not articulating clearly or having some trouble explaining, I apologize in advance!
#!/usr/bin/python
from __future__ import print_function
from unicorn import *
from unicorn.x86_const import *
# code to be emulated
# Strings should include apple.com and hpe.com
X86_CODE64 = b'\xc7D$<\xa9GY\x01\xc7D$@\xa2XQ/\x8bD$<\x8aD$8\x84\xc0u\x19H\x8b\xcb\x8bD\x8c<5\xc17</\x89D\x8c<H\xff\xc1H\x83\xf9\x02r\xeaE3\xc0H\x8dT$<H\x8b\xcf\xe8<\xd2\xfe\xff\x88]\xa4\xc7E\xa8\x86/\x00v\xc7E\xac\x82q\x13u\xc7E\xb0\x8a_p\x1a\x8bE\xa8\x8aE\xa4\x84\xc0u\x19H\x8b\xcb\x8bD\x8d\xa85\xe7_p\x1a'
# Strings should be svchost.exe
#X86_CODE64 = b"\xba\xe7_p\x1a\xc7D$|\x94)\x13r\xc7E\x80\x88,\x044\xc7E\x84\x82'\x15:\x89U\x88\x8bD$|\x8aD$x\x84\xc0u\x16H\x8b\xcf\x8bD\x8c|3\xc2"
# Strings should be apple.com
#X86_CODE64 = b'\xc7E\xa8\x86/\x00v\xc7E\xac\x82q\x13u\xc7E\xb0\x8a_p\x1a\x8bE\xa8\x8aE\xa4\x84\xc0u\x19H\x8b\xcb\x8bD\x8d\xa85\xe7_p\x1a'
# Set up Unicorn
ADDRESS = 0x10000000
STACK_ADDRESS = 0x90000
mu = Uc(UC_ARCH_X86, UC_MODE_64)
mu.mem_map(ADDRESS, 4 * 1024 * 1024)
mu.mem_map(STACK_ADDRESS, 4096*10)
# Write code to memory
mu.mem_write(ADDRESS, X86_CODE64)
# Initialize Stack for functions
mu.reg_write(UC_X86_REG_ESP, STACK_ADDRESS 4096)
mu.reg_write(UC_X86_REG_EDX, 0x0000)
# Run the code
try:
mu.emu_start(ADDRESS, ADDRESS len(X86_CODE64), timeout=10000)
except UcError as e:
pass
#a = mu.mem_read(ADDRESS, 4 * 1024 * 1024)
#print(a)
b = mu.mem_read(STACK_ADDRESS, 4096*10)
with open('asdf.txt', 'ab') as fp:
fp.write(b)
CodePudding user response:
There are few problems with this code.
First of all you probably never want to swallow all the exceptions as you do by writing pass
in your except
at least on the top level. At least it would be good to write them to the console just for the sake of knowing if anything unexpected happened. If you would do that you would notice that unicorn is throwing an Invalid memory fetch (UC_ERR_FETCH_UNMAPPED)
during the execution of the code.
If you would analyze the bytes you would notice there's a strange call in the middle of the first code
40: e8 3c d2 fe ff call 0xfffffffffffed281
This call is right after decrypting the hpe.com
and unicorn stops executing the code and never gets to the second part of the code. There's probably a better way to handle this in unicorn, but for now lets just nop
the call (replace 5 bytes with 5x\x90
). This would still not produce the expected apple.com
string as this code has more problems. The second part (after the call) is not using RSP
but RBP
and you are not setting it in your code.
So we need to add that:
mu.reg_write(UC_X86_REG_EBP, STACK_ADDRESS 4096)
And here's another problemy. You are setting unicorn for 64bit, yet you initialize the 32-bit registers - ESP
, EDX
. Is this on purpose? In your case it's probably not a problem but you probably should initialize 64-bit regs.
After adding RBP
to be set to some stack address, you still won't see the second string as the code is kind of cut too early. The last instructions are read & xor
6a: 8b 44 8d a8 mov eax,DWORD PTR [ebp ecx*4-0x58]
6e: 35 e7 5f 70 1a xor eax,0x1a705fe7
but there's no store, no increment to the next part and no loop.
Maybe you copy too little bytes. If we add those missing bytes so: 89448da8
for store (mov DWORD PTR [rbp rcx*4-0x58],eax
), 48ffc1
for inc rcx
, 4883f903
for cmp rcx, 0x3
and lastly 72ea
for jb -0x16
.
So in total your first code misses the following bytes 89448da848ffc14883f90372ea
( nop the call
) and with that
❯ python3 program.py
❯ strings asdf.txt
apple.com
hpe.com
you get what's expected.
Briefly checked the 2nd and 3rd code and it appears there no call
but they are missing the store, inc & loop part too.