I was inspecting the memory maps of a Python process on a Linux system and found something very surprising. Normally, when I inspect the maps for a Python process they look like this:
00400000-00401000 r-xp 00000000 fe:01 2904802 python3.9
00600000-00601000 r--p 00000000 fe:01 2904802 python3.9
00601000-00602000 rw-p 00001000 fe:01 2904802 python3.9
00637000-00abe000 rw-p 00000000 00:00 0 [heap]
...
7f67d8565000-7f67d8593000 rw-p 00000000 00:00 0
7f67d8593000-7f67d88ea000 r-xp 00000000 fe:01 2904547 libpython3.9.so.1.0
7f67d88ea000-7f67d8ae9000 ---p 00357000 fe:01 2904547 libpython3.9.so.1.0
7f67d8ae9000-7f67d8aef000 r--p 00356000 fe:01 2904547 libpython3.9.so.1.0
7f67d8aef000-7f67d8b29000 rw-p 0035c000 fe:01 2904547 libpython3.9.so.1.0
7f67d8b29000-7f67d8b4b000 rw-p 00000000 00:00 0
...
7fff72a4f000-7fff72a70000 rw-p 00000000 00:00 0 [stack]
7fff72a7c000-7fff72a80000 r--p 00000000 00:00 0 [vvar]
7fff72a80000-7fff72a82000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
This has the following structure:
- All the maps associated with the same binary/shared object are contiguous.
- The maps for the executable (
python3.9
) appear first and the map for a shared library that is opened appear after the ones in the executable. This makes sense because the executable is loaded first and then the loader loads the shared object as is in theDT_NEEDED
section.
But the maps that I found look like this:
00400000-00401000 r-xp 00000000 fd:00 67488961 python3.9
00600000-00601000 r--p 00000000 fd:00 67488961 python3.9
00601000-00602000 rw-p 00001000 fd:00 67488961 python3.9
0067b000-00a58000 rw-p 00000000 00:00 0 [heap]
...
7f7b46014000-7f7b46484000 r--p 0050b000 fd:00 1059871 libpython3.9.so.1.0
7f7b46484000-7f7b46485000 ---p 00000000 00:00 0
7f7b46485000-7f7b46cda000 rw-p 00000000 00:00 0
7f7b46cda000-7f7b46d16000 r--p 00a3d000 fd:00 1059871 libpython3.9.so.1.0
7f7b46d16000-7f7b46d6f000 rw-p 00000000 00:00 0
7f7b46d6f000-7f7b46d92000 r--p 00001000 fd:00 67488961 python3.9
7f7b46d92000-7f7b46d93000 ---p 00000000 00:00 0
7f7b46d93000-7f7b475d3000 rw-p 00000000 00:00 0
...
7f7b5a35d000-7f7b5a827000 r-xp 00000000 fd:00 1059871 libpython3.9.so.1.0
7f7b5a827000-7f7b5aa27000 ---p 004ca000 fd:00 1059871 libpython3.9.so.1.0
7f7b5aa27000-7f7b5aa2c000 r--p 004ca000 fd:00 1059871 libpython3.9.so.1.0
7f7b5aa2c000-7f7b5aa67000 rw-p 004cf000 fd:00 1059871 libpython3.9.so.1.0
7f7b5aa67000-7f7b5aa8b000 rw-p 00000000 00:00 0
...
7fff26f8e000-7fff27020000 rw-p 00000000 00:00 0 [stack]
7fff27102000-7fff27106000 r--p 00000000 00:00 0 [vvar]
7fff27106000-7fff27108000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 /vsyscall]
This has the following oddities:
- There is not a contiguous chunk of segments associated with the
python3.9
or thelibpython3.9.so.1.0
shared object. Indeed, there are scattered chunks for those. - There are maps for the executable that can be found after the ones in the shared library:
7f7b46cda000-7f7b46d16000 r--p 00a3d000 fd:00 1059871 libpython3.9.so.1.0
7f7b46d16000-7f7b46d6f000 rw-p 00000000 00:00 0
7f7b46d6f000-7f7b46d92000 r--p 00001000 fd:00 67488961 python3.9
7f7b46d92000-7f7b46d93000 ---p 00000000 00:00 0
Do you know what can cause this effect or in what conditions this can happen? Do you know how is possible that a memory map for the executable is loaded after several shared objects?
Note: This is using kernel Kernel 5.13.12-100.fc33.x86_64
.
CodePudding user response:
Do you know what can cause this effect or in what conditions this can happen?
An executable can trivially mmap
(parts of) itself. This could be done to e.g. examine its own symbol table (necessary to print crash stack trace), or to extract some embedded resource.
The maps for the executable (python3.9) appear first and the map for a shared library that is opened appear after the ones in the executable.
This is only true by accident, and only for non-PIE executables.
Non-PIE executables on x86_64
are traditionally linked to load at address 0x400000
, and the shared libraries are normally loaded starting from below the main stack.
If you link a non-PIE executable to load at e.g. 0x7ff000000000
, then it will likely appear in the /proc/$pid/maps
after shared libraries.
Update:
the python binary here is certainly not mmapping itself, so that explanation doesn't apply
- You can't know that -- you almost certainly haven't read all the code in Python 3.9 and every module which you load.
- There is no need to guess where these
mmap
ed regions are coming from, you can just look.
To look, run your program under GDB and use catch syscall mmap
followed by where
. This will allow you to see where each and every mapping came from.