I wanted to test a my idea about function pointers. I wrote a test but it doesn't work(causes a Segmentation fault
). Here is my test simplified:
#include<stdlib.h>
void a(){}
void b(){}
int main(){
size_t s=0;
char* p1=(char*)&a;
char* p2=(char*)&b;
while(*p1 ==*p2 ) s ;
void (*o)();
char* c=malloc(s);
while(s--) *(c s)=*((char*)&a s);
o=(void(*)())c;
o();
return 0;
}
The code should copy machine code of an empty function(here if finds the intersection of data at the addresses of two empty function and copies that) and run the copy.
Thanks for reaching out.
CodePudding user response:
Just a few of the problems:
- Converting between object pointers and function pointers invokes undefined behavior so you have no guarantees that correct machine code will be generated for your program.
- You can't make any assumption of where these functions are allocated in memory.
- Using
char
for raw binary is always wrong, since it has implementation-defined signedness. Useunsigned char
oruint8_t
. - Read accessing memory reserved for execution might cause a hardware exception from the MMU on most high end CPUs. And if not from the MMU then an exception from the OS. So what you are trying to do (copying a function's machine code to data RAM?) might not be possible.
- The malloc chunk you get might not have the same alignment requirements as a function, meaning that the address might not be suitable.
*(c s) =
is a fairly obvious array out of bounds bug. Don't write obscure loops that iterate downwards for no good reason. In this case simply usememcpy
.- When trying out your code on some x86_64 the functions a and b ends up as the raw binary
ret
instruction followed by a bunch of binary corresponding to the calling convention. More me, the functions stop having the same machine code before you reach the end of either of them. This is what the binary looks like:
But when I print the contents copied down to c
(after fixing the other bugs mentioned in this answer) there's just 2 bytes holding C3 66, but as you can see from the pic that's in the middle of an instruction. I'm not great at decoding raw x86 asm, but the algorithm appears to be fundamentally wrong.