I am taking the Operating Systems course and trying to understand how exactly fork()
works. I know the basic definition: "fork() creates a new process by duplicating the calling process.".
I was told the child process will only execute the codes below the fork()
statement. What I don't understand is, let's say we have a 10-lined code. If line 6 is fork()
, Will the whole code from lines 1 to 10 run in the child process or only 6 to 10 ?
How will this code will work:
int main(){
pid_t smith;
int a=2; int b=3;
smith = fork();
if(smith == 0){
fork();
a ;
fork();
}
else if(smith > 0){
b ;
fork();
}
printf("%d %d",a,b);
}
The fork()
s in if blocks are confusing. How will they act? When a child process is created from the fork() in the else if, what code will it run?
CodePudding user response:
I was told the child process will only execute the codes below the fork() statement. What I don't understand is, let's say we have a 10-lined code. If line 6 is fork(), Will the whole code from lines 1 to 10 run in the child process or only 6 to 10 ?
Imagine I give you a piece of paper containing this list:
1. Apple
2. Banana
3. Cranberry
4. Daffodil
5. Eggplant
6. Fuchsia
7. Germanium
8. Hollyhock
9. Indigo
10. Jasmine
Imagine you start reading this list out loud, from top to bottom. Imagine you have just said "Five. Eggplant." Imagine that at this point I point a magic duplication ray at you, creating a second, identical copy of Baran Açıkgöz and a second identical list. The two Baran Açıkgözs are virtually identical: they have the same memories and the same thoughts in their heads. They're both doing the same thing that the original Baran Açıkgöz had been doing at the moment of duplication. So, what do the two Baran Açıkgözs do next?
That's how fork
works.
CodePudding user response:
What you need to understand is that fork copies the entire memory space of the parent process and runs the child process into that copied environment. Of course when copying occurs, the instructions are copied as well, so the program that is run is exactly the same.
If you have difficulties seeing this through ifs and elses, looking at the assembly might help as the execution is more streamlined.
This code:
int main(){
pid_t pid;
int a=2; int b=3;
pid = fork();
if(pid == 0){
a ;
}
else if(pid > 0){
b ;
}
printf("%d %d",a,b);
}
Compiles into this:
// String literal
.LC0:
.ascii "%d %d\000"
// int a=2; int b=3;
movs r3, #2
str r3, [r7, #12]
movs r3, #3
str r3, [r7, #8]
// fork()
bl fork
// if pid == 0
// This code will be executed by the parent
mov r3, r0 // return is in r0
str r3, [r7, #4]
ldr r3, [r7, #4]
cmp r3, #0
bne .L2
// a ;
ldr r3, [r7, #12]
adds r3, r3, #1
str r3, [r7, #12]
// Look at how the next section is skipped.
b .L3
// This code will be executed by the child
.L2:
ldr r3, [r7, #4]
cmp r3, #0
ble .L3
ldr r3, [r7, #8]
adds r3, r3, #1
str r3, [r7, #8]
// Now both parents and child execute this last part
.L3:
ldr r2, [r7, #8]
ldr r1, [r7, #12]
movw r0, #:lower16:.LC0
movt r0, #:upper16:.LC0
// And then we have the printf
bl printf
I hope it is clear now how the same assembly is executed by parent and child, the IFs determine which part of the code is executed.
If for instance you don't run conditionals on the return value of fork both child and parent will run exactly the same code.
CodePudding user response:
If you instrument the program as discussed in comment 1 and comment 2, you end up with code like this (source: fork67.c
, executable: fork67
):
#include <stdio.h>
#include <sys/wait.h>
#include <unistd.h>
int main(void)
{
printf("PID = ], PPID = ]\n", getpid(), getppid());
int a = 2;
int b = 3;
pid_t smith = fork();
if (smith == 0)
{
fork();
a ;
fork();
}
else if (smith > 0)
{
b ;
fork();
}
printf("PID = ], PPID = ]: a = %d, b = %d\n", getpid(), getppid(), a, b);
int corpse;
int status;
int count = 0;
while ((corpse = wait(&status)) > 0)
{
printf("PID = ], PPID = ]: child ] exited with status 0x%.4X\n",
getpid(), getppid(), corpse, status);
count ;
}
printf("PID = ], PPID = ]: reported on ] children\n",
getpid(), getppid(), count);
return count;
}
With one example run, I got the output:
PID = 93578, PPID = 1320
PID = 93578, PPID = 1320: a = 2, b = 4
PID = 93580, PPID = 93578: a = 2, b = 4
PID = 93580, PPID = 93578: reported on 0 children
PID = 93579, PPID = 93578: a = 3, b = 3
PID = 93578, PPID = 1320: child 93580 exited with status 0x0000
PID = 93582, PPID = 93579: a = 3, b = 3
PID = 93581, PPID = 93579: a = 3, b = 3
PID = 93582, PPID = 93579: reported on 0 children
PID = 93583, PPID = 93581: a = 3, b = 3
PID = 93583, PPID = 93581: reported on 0 children
PID = 93581, PPID = 93579: child 93583 exited with status 0x0000
PID = 93579, PPID = 93578: child 93582 exited with status 0x0000
PID = 93581, PPID = 93579: reported on 1 children
PID = 93579, PPID = 93578: child 93581 exited with status 0x0100
PID = 93579, PPID = 93578: reported on 2 children
PID = 93578, PPID = 1320: child 93579 exited with status 0x0200
PID = 93578, PPID = 1320: reported on 2 children
My login shell has the PID 1320. You can see the processes' execution paths fairly clearly.
A still more instrumented version captures the return value from each fork()
and prints that information too, leading to fork79.c
and program fork79
:
#include <stdio.h>
#include <sys/wait.h>
#include <unistd.h>
int main(void)
{
printf("PID = ], PPID = ]\n", getpid(), getppid());
int a = 2;
int b = 3;
pid_t smith = fork();
printf("PID = ], PPID = ]: c0 = ]\n", getpid(), getppid(), smith);
if (smith == 0)
{
int c1 = fork();
printf("PID = ], PPID = ]: c1 = ]\n", getpid(), getppid(), c1);
a ;
int c2 = fork();
printf("PID = ], PPID = ]: c2 = ]\n", getpid(), getppid(), c2);
}
else if (smith > 0)
{
b ;
int c3 = fork();
printf("PID = ], PPID = ]: c3 = ]\n", getpid(), getppid(), c3);
}
printf("PID = ], PPID = ]: a = %d, b = %d\n", getpid(), getppid(), a, b);
int corpse;
int status;
int count = 0;
while ((corpse = wait(&status)) > 0)
{
printf("PID = ], PPID = ]: child ] exited with status 0x%.4X\n",
getpid(), getppid(), corpse, status);
count ;
}
printf("PID = ], PPID = ]: reported on ] children\n",
getpid(), getppid(), count);
return count;
}
Sample run:
PID = 93985, PPID = 1320
PID = 93985, PPID = 1320: c0 = 93986
PID = 93985, PPID = 1320: c3 = 93987
PID = 93985, PPID = 1320: a = 2, b = 4
PID = 93986, PPID = 93985: c0 = 0
PID = 93987, PPID = 93985: c3 = 0
PID = 93987, PPID = 93985: a = 2, b = 4
PID = 93987, PPID = 93985: reported on 0 children
PID = 93986, PPID = 93985: c1 = 93988
PID = 93988, PPID = 93986: c1 = 0
PID = 93985, PPID = 1320: child 93987 exited with status 0x0000
PID = 93986, PPID = 93985: c2 = 93989
PID = 93986, PPID = 93985: a = 3, b = 3
PID = 93989, PPID = 93986: c2 = 0
PID = 93989, PPID = 93986: a = 3, b = 3
PID = 93988, PPID = 93986: c2 = 93990
PID = 93988, PPID = 93986: a = 3, b = 3
PID = 93989, PPID = 93986: reported on 0 children
PID = 93990, PPID = 93988: c2 = 0
PID = 93990, PPID = 93988: a = 3, b = 3
PID = 93986, PPID = 93985: child 93989 exited with status 0x0000
PID = 93990, PPID = 93988: reported on 0 children
PID = 93988, PPID = 93986: child 93990 exited with status 0x0000
PID = 93988, PPID = 93986: reported on 1 children
PID = 93986, PPID = 93985: child 93988 exited with status 0x0100
PID = 93986, PPID = 93985: reported on 2 children
PID = 93985, PPID = 1320: child 93986 exited with status 0x0200
PID = 93985, PPID = 1320: reported on 2 children
Because there is printing before and while the forking occurs, this code is vulnerable to the printf
anomaly referenced in printf
anomaly after fork()
. Here's the output when its output is piped through cat
:
$ fork79 | cat
PID = 94002, PPID = 1320
PID = 94002, PPID = 1320: c0 = 94004
PID = 94005, PPID = 94002: c3 = 0
PID = 94005, PPID = 94002: a = 2, b = 4
PID = 94005, PPID = 94002: reported on 0 children
PID = 94002, PPID = 1320
PID = 94004, PPID = 94002: c0 = 0
PID = 94004, PPID = 94002: c1 = 94006
PID = 94007, PPID = 94004: c2 = 0
PID = 94007, PPID = 94004: a = 3, b = 3
PID = 94007, PPID = 94004: reported on 0 children
PID = 94002, PPID = 1320
PID = 94004, PPID = 94002: c0 = 0
PID = 94006, PPID = 94004: c1 = 0
PID = 94008, PPID = 94006: c2 = 0
PID = 94008, PPID = 94006: a = 3, b = 3
PID = 94008, PPID = 94006: reported on 0 children
PID = 94002, PPID = 1320
PID = 94004, PPID = 94002: c0 = 0
PID = 94006, PPID = 94004: c1 = 0
PID = 94006, PPID = 94004: c2 = 94008
PID = 94006, PPID = 94004: a = 3, b = 3
PID = 94006, PPID = 94004: child 94008 exited with status 0x0000
PID = 94006, PPID = 94004: reported on 1 children
PID = 94002, PPID = 1320
PID = 94004, PPID = 94002: c0 = 0
PID = 94004, PPID = 94002: c1 = 94006
PID = 94004, PPID = 94002: c2 = 94007
PID = 94004, PPID = 94002: a = 3, b = 3
PID = 94004, PPID = 94002: child 94007 exited with status 0x0000
PID = 94004, PPID = 94002: child 94006 exited with status 0x0100
PID = 94004, PPID = 94002: reported on 2 children
PID = 94002, PPID = 1320
PID = 94002, PPID = 1320: c0 = 94004
PID = 94002, PPID = 1320: c3 = 94005
PID = 94002, PPID = 1320: a = 2, b = 4
PID = 94002, PPID = 1320: child 94005 exited with status 0x0000
PID = 94002, PPID = 1320: child 94004 exited with status 0x0200
PID = 94002, PPID = 1320: reported on 2 children
$
Note that the line PID = 94002, PPID = 1320
appears 6 times, once for each of the 6 processes that run: 94018,
94023,
94024,
94026,
94027,
94029.
There many variations that you can experiment with. On my machine, the original code (from the question) produces:
$ fork37
2 42 43 33 33 33 3$
because there's no newlines output, so the command prompt appears after the last 3 3
. Curiously, I never got the prompt to appear before the last numbers — that surprises me. Adding the basic monitoring proposed in the comments gives output such as:
$ fork59
PID = 95109, PPID = 1320: a = 2 b = 4
PID = 95111, PPID = 1: a = 2 b = 4
PID = 95110, PPID = 1: a = 3 b = 3
PID = 95113, PPID = 95110: a = 3 b = 3
PID = 95112, PPID = 1: a = 3 b = 3
PID = 95114, PPID = 95112: a = 3 b = 3
$
When the parent PID (PPID) is 1
, it means the original process that forked the current PID has already exited. With the wait loop, the processes are more synchronized.