Erlang gen_tcp accept vs OS-Thread accept-CodePudding

I have two models of listening sockets and acceptors in Erlang:

------------FIRST------------

-module(listeners).
....

start() ->
{ok, Listen}=gen_tcp:listen(....),
accept(Listen).

%%%%%%%%%%%%%%%%%%%%%

accept(Listen) ->
{ok, Socket}=gen_tcp:accept(Listen),
spawn(fun() ->handle(Socket) end),
accept(Listen).

%%%%%%%%%%%%%%%%%%%%%

handle(Socket) ->
....

---------SECOND----------

-module(listener).
....

start() ->
supervisor:start_link({local,?MODULE},?MODULE, []). 

%%%%%%%%%%%%%

init([]) ->
{ok, Listen}=gen_tcp:listen(....),
spawn(fun() ->free_acceptors(5) end), 
{ok, {{simple_one_for_one, 5,1},[{child,{?MODULE,accept,[Listen]},....}]}.

%%%%%%%%%%%%%

free_acceptors(N) ->
[supervisor:start_child(?MODULE, []) || _ <-lists:seq(1,N)],
ok.

%%%%%%%%%%%%%

accept(Listen) ->
{ok, Socket}=gen_tcp:accept(Listen). 
handle(Socket). 

%%%%%%%%%%%%%%

handle(Socket) ->
....

The first code is simple, the main process creates a listen socket and listen to accept new connections, when a connection came, it accept the connection spawn a new process to handle it and returns to accept other new connections.

The second code is also simple, the main process creates a supervision tree, the supervisor creates a listen socket and start 5 childs (spawning a new process to run free_acceptors/1 because this function calls the supervisor process and the supervisor is in it's init function and it can't start childs before it's own start so the new process will wait the supervisor until it finishes it's initiation) and give the listen socket as an argument to it's childs, and the five children start listen to accept new coming connections at the SAME time.

So we run the two codes each in a separate machine that have a CPU with a single core and 5 clients try to connect at the same time to the first server and other 5 to the second server: from the first look, i thinked that the second server is faster because all connections will be accepted in parallel and at the same time and in the first code the fivest client will wait the server to accept the precedents four to accept it and so on. but going deeply at the ERTS, we have a single OS-Thread per core to handle erlang processes and since a Socket is an OS structure then gen_tcp:listen will call OS-Thread:listen (this is just pseudo code to understand) to create an OS Socket and gen_tcp:accept calls OS-Thread:accept to accept new connection and this later can accept just one connection at a time and the fivest client still wait the server to accept the fourth precedents, so is there difference between the two codes ? i hope that you understand me.

Even if the code doesn't include sockets the Erlang processes will be always concurrent and not parallel because there is just one core, but the Sheduler will manage tasks between processes very fast and close to parallel run, so the problem is in the use of sockets that use OS calls across the single OS-Thread.

NOTE : Ejabberd use the first implementation and Cowboy use the second.

CodePudding user response：

At OS level, a listen socket has associated a queue of OS-threads waiting to accept connections, regardless of whether this queue has any OS-thread blocked on it or is empty because it will be handled differently (busy-waiting non-blocking accept, select, epoll...).

The BEAM does not have a single OS-thread even if you run it on a system with a single CPU, it has different types of OS-threads

Regarding your question I suspect that it will be, if anything, better to have multiple acceptor erlang-threads continuously blocking on the gen_tcp:accept call because that way the ERTS has knowledge about the erlang code willing to accept more connections (the handle(Socket) in your second example should spawn a worker or send the accepted socket to a worker and get back to accept connections) while with the single accept-spawn loop this knowledge is hidden.

I'm not familiar enough with the code to know the nuances, but it seems that the code handles multiple accepts nicely, queueing them internally, so it might be marginally better to have multiple acceptors.
I.e. In the first example with a single request there is a moment where there is nobody accepting connections, while you need a higher number of simultaneous request in the second example for this to happen.

CodePudding user response：

iam cumming from lot of search about, so i think that i found the answer for that, but i want just to correct me Mr José if i have wrong in anything :

1-when we run gen_tcp:listen the ERTS opens an ERLANG port (a listen socket) to communicate with a linked-in C driver, this driver which runs under a MAIN OS-Thread opens a REAL SOCKET.

2-when we run gen_tcp:accept the ERTS use this port to call the driver using a Specified Macro as an argument to the function erlang:port_control, the driver MAIN OS-Thread will spawn an OS-Thread that will run a REAL accept at the opened Socket(Blocking Accept) but this is just my view me too i'm not familiar with the C Accept function, anyway this is the Ericsson's Team job.

3-when a client send a request to connect to this Socket, the OS-Thread accept the connection and create a new Socket of communication with this client and the Erlang process creates a new Port and link it to this OS-Thread to be the driver for this specified communication with that client.

4-when the Erlang process send Data via this new port the new driver send this Data via the new Socket, and the same with Receive Data.

5-The MAIN OS-Thread driver will not spawn a new OS-Thread at each Erlang Accept and will do a balance between OS-Threads and connections (again this is the Ericsson Design) and these Threads will manage connections with one of the known functions (select, poll, epoll,.....) and generally it's epoll for Linux and Kqueue for Bsd Systems and each OS-Thread will run this function at dual sides :one side to interact with clients sockets and one side to interact with Erlang ports.

this is the exact work of any driver, it hidden things and let's the Emulator behaves like it does the work directly.

the answer to the first question is that the second code is more efficient and like you have tell me, when there are many Erlang-Acceptors the Driver knows about this and spawns many OS-Acceptors, here there is another problem: how many acceptors i can spawn for a socket ?

the free acceptors design is for accepting connections in PARALLEL and it's clear that one OS-Thread can't accept two connections at the same time so if the number of acceptors is larger than the number of cores for example if we have 8 cores and 10 acceptors and 20 clients came at the same time, we have 8 accepted connections in parallel and next another 8 in parallel and next 4 so this has the same efficiency that we create 8 free acceptors.( i talk about correct version of code when we have always 8 free acceptors and when an acceptor accept a connection it spawns a process to handle this connection and return to accept other connections)

Networking is the most important part in designing fault-tolerant and scalable servers in Erlang/OTP and i want to understand it well before doing anything so Please Mr José if iam wrong in something just tell me Thank you.