In a routine in x86 assembly, what happens if the code contains a jump that points to a valid address in between two valid addresses? Here is an artificial example:
0x0001: mov ...
0x0005: add ...
0x0009: jmp 0x0003
Besides, how can I experiment with something like this on a local machine or online? I checked the online x86 editor like https://defuse.ca/online-x86-assembler.htm#disassembly, but it does not allow me to put the instruction addresses like "0x0001".
CodePudding user response:
There is no such thing as a “valid” or an “invalid” address. Every address can be jumped to and if the corresponding page is mapped, executed.
So what happens when you jump “between” instructions? Well, the processor does not know where you intend instructions to begin and end. It just executes the bytes it sees. This code will be different from what you expect because the CPU tries to parse the middle of some other instruction as an opcode.
Your specific example is not sufficiently specified for me to say what instructions result. Perhaps you can provide a completely specified example (including the machine code) so I can give a better explanation.
CodePudding user response:
The CPU will start decoding instructions at the target address.
The instruction stream you are looking at in a disassembly (when using a tool like objdump
) is merely one interpretation of the executable bytes of the program, assuming a given start point.
As it happens "jumping into the middle of an instruction" is an obfuscation technique sometimes used by malware to hide program semantics from linear sweep disassemblers (like objdump
). More intricate disassemblers will make an attempt to follow these "misaligned" jumps, but it may not be possible, depending upon what can/can't be determined statically/dynamically.
The paper "Obfuscation of executable code to improve resistance to static disassembly" by Linn and Debray talks about this in more detail.
See Section 3.2 "Junk Insertion". The scenario you describe is what they refer to as "partially or fully overlapping instructions", i.e. different interpretations of the byte stream can give different assembly instructions for overlapping address ranges.
CodePudding user response:
I recently added a trick to codegolf's "Tips for golfing in x86/x64 machine code" about skipping instructions. You'll find that those are an intentional application of jumping into part of a prior instruction. And not only for obfuscation. Here's the text of that answer in full:
Skipping instructions
Skipping instructions are opcode fragments that combine with one or more subsequent opcodes. The subsequent opcodes can be used with a different entrypoint than the prepended skipping instruction. Using a skipping instruction instead of an unconditional short jump can save code space, be faster, and set up incidental state such as NC
(No Carry).
My examples are all for 16-bit Real/Virtual 86 Mode, but a lot of these techniques can be used similarly in 16-bit Protected Mode, or 32- or 64-bit modes.
Quoting from my ACEGALS guide:
11: Skipping instructions
The constants __TEST_IMM8, __TEST_IMM16, and __TEST_OFS16_IMM8 are defined to the respective byte strings for these instructions. They can be used to skip subsequent instructions that fit into the following 1, 2, or 3 bytes. However, note that they modify the flags register, including always setting NC. The 16-bit offset plus 16-bit immediate test instruction is not included for these purposes because it might access a word at offset 0FFFFh in a segment. Also, the __TEST_OFS16_IMM8 as provided should only be used in 86M, to avoid accessing data beyond a segment limit. After the db instruction using one of these constants, a parenthetical remark should list which instructions are skipped.
The 86 Mode defines in lmacros1.mac 323cc150061e (2021-08-29 21:45:54 0200):