I am new to assembly and have a simple C program printing a string to stdout. In assembly, this translates to main
calling _IO_Puts
. The implementation of _IO_puts
is as below taken from: https://code.woboq.org/userspace/glibc/libio/ioputs.c.html is given below.
int
_IO_puts (const char *str)
{
int result = EOF;
size_t len = strlen (str);
_IO_acquire_lock (stdout);
if ((_IO_vtable_offset (stdout) != 0
|| _IO_fwide (stdout, -1) == -1)
&& _IO_sputn (stdout, str, len) == len
&& _IO_putc_unlocked ('\n', stdout) != EOF)
result = MIN (INT_MAX, len 1);
_IO_release_lock (stdout);
return result;
}
I am unable to figure out why the number of dynamic instructions change and sometimes decrease with increase in the string length on a simulates MIPS processor?
CodePudding user response:
As noted in comments, glibc _IO_sputn
involves doing strlen, and memcpy of that size.
glibc strlen on MIPS uses the pure C bithack that checks 4 bytes at a time. (Unlike most other ISAs that have hand-written asm). Why does glibc's strlen need to be so complicated to run quickly?. It's pretty non-simple, and its startup strategy depends on alignment of the start of the string.
More relevant here, finding the terminating 0
byte at the 4th byte of one word vs. the 1st byte of the next probably also costs fewer instructions. (Or 8th vs. 1st for MIPS64). So that's probably why you're seeing non-monotonic scaling of the dynamic instruction count.
memcpy
will also take fewer instructions for a multiple of 4 (or 8) bytes, and it picks strategies via branching on the size and maybe alignment of src and dst. (MIPS before MIPS32/64r6 doesn't have guaranteed efficient unaligned stores). So sequential calls to stdio functions that don't involve flushing the buffer can lead to different destination alignments of the stdout buffer.