Unnecessary pushing of registers to the stack when others with the same value exist, why?-CodePudding

I'm trying to learn socket programming with assembly and I'm told to write this code:

section .text
global _start

_start:
    push    byte 6
    push    byte 1
    push    byte 2
    mov     ecx, esp
    mov     ebx, 1
    mov     eax, 102
    int     80h
 
    ;confusing part
    mov     edi, eax ;why is this (and pushing it) necessary?
    push    dword 0x00000000
    push    word 0x2923
    push    word 2
    mov     ecx, esp
    push    byte 16
    push    ecx      
    push    edi      ;why not push eax directly? Doesn't sys_socketcall put the file descriptor back into eax?
    mov     ecx, esp 
    mov     ebx, 2
    mov     eax, 102
    int     80h

I added comments to the parts I don't understand. The full code with the irrelevant parts can be found here.

Why do they move a register's value to another? Couldn't they just push the original one?

CodePudding user response：

mov edi,eax is one way to keep the fd available for use after the second int 80h. If you've decided you're going to keep the socket fd in EDI from now on, as its permanent home, it makes sense to push edi. But yes, push eax would work because at this point there's still a copy of it in EAX.

It seems rather pointless in the "lesson 30" where they first introduce this; they just exit right away. But if you look at later "lessons" they build on this code, using it unchanged and adding more system calls after (like listen, which also uses push edi). In later code, you still want to push the fd from socket(), not the return value from bind().

The code is a bit hacky, e.g. not removing the structs they keep pushing, so if you did this in a loop you'd run out of stack space eventually. Normally you'd add esp, whatever to undo the pushes after a system call.

Or use mov stores to the stack to replace the existing struct of args with the new one. If the file-descriptor is in the same place, you could just leave it there. (Or in the current code, push dword [esp 24] or something in the next block of code, but keeping the fd in a register makes a lot more sense if you're going to store it again.)

It's also weird to write something like push byte 1. That still adjusts ESP by 4 and stores 0x00000001, unlike push word 2 which only adjusts ESP by 2 and stores 0x0002. (How many bytes does the push instruction push onto the stack when I don't specify the operand size?)

(x86 only supports word and dword operand-size for push, but supports imm8 and full-width immediates. If you wanted to force a 1-byte immediate, the syntax is push strict byte 1, but NASM already does that because optimization is enabled by default in NASM for at least a decade.)

BTW, since Linux 4.3, there are separate system-call numbers for i386 for every system call in the sockets API, like bind and listen, so you don't need to push structs on the stack and call the generic socketcall entry point. The man page says x86-64 and ARM don't have a socketcall system call at all, always just separate syscall entry points for each system call.