How are control flow graphs built in cases where the jump destination is based on a dynamic environm-CodePudding

In studying reverse engineering, it's frequently occurred to me that since I can pass any location (that I have permission to access) as the argument, a jump instruction with some non-hardcoded or "non deterministic" target (as in it's not defined previously by the program clearly) could aim anywhere. So if I load EAX with a value based on say, the string of the OS version and execute Jcc EAX then it seems like any tool attempting to generate a control flow graph would have no idea where the target would be (it could base it on your current environment, but that might lead to some broken pathway through the program).

Am I missing something? Because if I understood this correctly it seems like every malware I ever opened in IDA would do this (based on some condition they know about their target environment) but I don't see broken control flow graphs like this. Then again, I'm pretty new to reverse engineering.

CodePudding user response：

You are correct in your observation. There are two main ways to graph control flow for indirect jumps.

First, static analysis can be used. For example, if the jump target is found to be selected from a jump table of limited length, the decompiler can list the entries of the jump table as possible targets. Another common case is that the jump target is taken from a variable set elsewhere in the program, but always to the same value. The decompiler can too analyse the possible values of the variable and deduce possible jump targets.

Another option is to build the control flow graph not from potential behaviour of the program, but from actual behaviour observed from a simulated or actual run of the code. While this is likely to miss some possible control flow, it usually gives you a pretty decent picture of where jumps (including indirect jumps) usually go and allows for an explanation of the program's behaviour.

CodePudding user response：

Indeed, as you already guessed, an instruction of the form JMP EAX could possibly jump anywhere and would therefore break the CFG of the program you are trying to reverse engineer (that is, you would not have exiting arcs to known pieces of code from the current basic block).

However, compilers rarely emit absolute indirect jumps. When they do they usually are for jump tables which are generated for swtitch statements. If we are talking about indirect calls, then we also have function pointers, and you see instructions of the form CALL EAX (very common in C i.e. vtables). For switch statements for example, in order to correctly handle all possible values of its input variable or expression (e.g. an int in EAX), there usually is a little bit of code that makes sure the value is in a given range, and then an absolute indirect jump through a register.

The kind idioms used and the compiler implementations (i.e. what machine code is generated) for this kind of situations are usually well known, so disassemblers and decompilers can detect them and figure out the input constraints plus the location of the jump table, and therefore the correct destinations (which is what IDA/HexRays does). Sometimes however this is not possible or simply too difficult for your disassembler/decompiler to figure out (e.g. the compiler used different unknown semantics, or the programmer purposely tried make reverse engineering harder).