Home > Enterprise >  Usage of pusha/popa in function prologue/epilogue?
Usage of pusha/popa in function prologue/epilogue?

Time:12-31

Reading assembly tutorials, I saw that "function" prologue/epilogue consist in :

push bp
mov bp, sp  
---
pop bp

But I also saw some other tutorials using pusha then popa to preserve registers. So why, function prologue/epilogue doesn't perform a pusha/popa to save registers context in addition to setup bp ?

CodePudding user response:

They don't save all registers because you don't always need all registers saved. Saving them and restoring them is slow. Yeah, it's a small single instruction which seems like a saving but it takes time and stack space. To get an idea of what is saved look at the calling conventions.

https://en.wikipedia.org/wiki/X86_calling_conventions

PUSHA/PUSHAD—Push All General-Purpose Registers

These are slow instructions. On Skylake, PUSHA takes 19 uops and 8 cycles of throughput. POPA takes 18 uops and 8c of throughput.

Also, PUSHA/PUSHAD are invalid in 64-bit. They were rightfully purged by AMD and then by Intel when x86 was extended to 64b.

Modern compilers go the other direction and avoid saving registers if possible. LLVM performs an analysis called shrink wrapping where the prolog gets pushed forward to allow fast early exit.

https://llvm.org/doxygen/ShrinkWrap_8cpp_source.html

These are terrible, horrible, no good, very bad instructions.

CodePudding user response:

Reading assembly tutorials, I saw that "function" prologue/epilogue...

For true assembly language (where you don't have to comply with the calling conventions of a different language) the words "function prologue/epilogue" don't make sense.

For "assembly language designed to comply with some other language's calling conventions"; you only need to save/restore some registers (possibly none).

For an example; for CDECL, the contents of EAX, ECX, and EDX can be trashed by the callee and never need to be saved/restored by the callee (the caller needs to save them if they care); and if a function doesn't use any other registers the callee doesn't need to save or restore any other registers either. Also note that "EBP as frame pointer" is antiquated rubbish (it existed because debuggers weren't very good and became pointless when debugging info improved - e.g. DWARF debugging info, etc). These things combined mean that something like this has acceptable prologue and epilogue for CDECL:

    myFunction:
            mov eax,12345      ;eax = returned value
            ret

If "lots" of registers do need to be saved and restored; pusha is slow (micro-coded), and a series of multiple push instructions is also slow (the address of a store depends on the value in ESP which was recently modified). The typical way is to do it yourself, like:

                        ;Don't bother saving EAX, ECX, EDX.
    sub esp,16          ;Create space to save 4 registers (but maybe more for local variables)

    mov [esp],ebx
    mov [esp 4],esi
    mov [esp 8],edi
    mov [esp 12],ebp

However; the cost is space. In a boot loader where code size is extremely limited (e.g. "part of 512 bytes") a smart programmer will use true assembly language (where "function prologue/epilogue" don't make sense) and a beginner might use pusha to save space (without realizing that they have no reason to care about other programming language's calling conventions).

  • Related