A file descriptor contains the index of an entry within the process file table. However, the index alone is not enough to locate a particular entry in the [process] file table. Knowledge about the address of the first entry within the table is also required. So, my question is this: How does the kernel, only provided with the file descriptor as an argument in system calls such as read
and write
, manage to determine the location of the intended entry within the process file table?
I tried to see what happens under the hood by converting the following C code into x86-64 assembly, but all I got was an additional assembly open instruction.
int main(int argc, char* argv[]) {
FILE* fd = fopen("home/mhdi/miles","r");
return 0;
}
.file "open.c"
.intel_syntax noprefix
.text
.section .rodata
.LC0:
.string "r"
.LC1:
.string "home/mhdi/miles"
.text
.globl main
.type main, @function
main:
.LFB6:
.cfi_startproc
endbr64
push rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
mov rbp, rsp
.cfi_def_cfa_register 6
sub rsp, 32
mov DWORD PTR -20[rbp], edi
mov QWORD PTR -32[rbp], rsi
lea rax, .LC0[rip]
mov rsi, rax
lea rax, .LC1[rip]
mov rdi, rax
call fopen@PLT
mov QWORD PTR -8[rbp], rax
mov eax, 0
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE6:
.size main, .-main
.ident "GCC: (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0"
.section .note.GNU-stack,"",@progbits
.section .note.gnu.property,"a"
.align 8
.long 1f - 0f
.long 4f - 1f
.long 5
0:
.string "GNU"
1:
.align 8
.long 0xc0000002
.long 3f - 2f
2:
.long 0x3
3:
.align 8
4:
CodePudding user response:
A file descriptor contains the index of an entry within the process file table. However, the index alone is not enough to locate a particular entry in the [process] file table. Knowledge about the address of the first entry within the table is also required. So, my question is this: How does the kernel, only provided with the file descriptor as an argument in system calls such as read and write, manage to determine the location of the intended entry within the process file table?
A file descriptor (for a process) is an integer value that the kernel gives to the process to identify the file in the user file table. As the kernel and the user process doesn't share the same virtual memory space, there must be a means for a process to indicate the kernel that the operation to be done is on one file and not to another (so a process can have several open files at the same time) There's no way for the user process to access the per process file table that the kernel maintains on each process, it is stored in the process' kernel private data, and it is not mapped to the virtual address space of the user process. Historically, it was stored in a per process private area called the u-area, but today the structure contents has changed too much and the contents include, things like the inode used for root directory based searches (the root directory of the process), the current working directory inode for searches based on a curren directory basis, parameters like the user limits for the process (in-core memory limit, max file size, max execution time, max memory to allocate, process umask, user group ids for the process...), and the open file table array (for which index indicates the actual file descriptor of the file), the process session id, the kernel stack for the process when running in kernel mode (in a multithreading operating system, there's also a per thread data structure maintained in the kernel to handle things like the user data cpu registers contents in user mode, etc.)
I tried to see what happens under the hood by converting the following C code into x86-64 assembly, but all I got was an additional assembly open instruction.
What you got was a call to the fopen(3)
library routine, not a system call.
To get under the hood, you need to start in the kernel source code, as listing assembly code will lead you until a special (normally, the interface to the kernel is done by means of a special assembler instruction that enforces a software trap, which you will see as a single assembler instruction, but you cannot trace further -- in linux/x86 the instruction is INT 0x80
)
In this case, you have dissasembled a code that calls fopen(3)
which is not a system call, but a standard library function. That is not the special instruction I mentioned above, but a normal subroutine call. In case you had called open(2)
(the actual system call that fopen(3)
ends calling) you will see that open is accessed by a similar call open
instruction, because all system calls are wrapped into C functions that do some housekeeping to make the parameters available to the system call (in Intel processors the way to call the system is by means of an INT 0x80
assembler instruction by software, that generates a long jump to a trap gate that raises the execution level mode of the processor to 0, and changes the virtual memory mapping, etc, etc) and to process the data coming from the kernel on return (like calling any signal handler in case the system has some pending interrupt handler to be called). But what happens in the kernel will be hidden to you, because it is not accessible to the running process. A system call for a process happens like the execution of a single machine instruction, and like you cannot know what has happened to the cpu state in every stage that happens inside a single instruction execution, you cannot know what has happened in between you executed the INT 0x80
and the next instruction you executed.