memset_pg.h
#include <stdint.h>
#include<stdio.h>
#include<string.h>
#define LONG_ALIGN_MASK (sizeof(long) - 1)
typedef size_t Size;
#define MEMSET_LOOP_LIMIT 1024
/*
* MemSet
* Exactly the same as standard library function memset(), but considerably
* faster for zeroing small word-aligned structures (such as parsetree nodes).
* This has to be a macro because the main point is to avoid function-call
* overhead. However, we have also found that the loop is faster than
* native libc memset() on some platforms, even those with assembler
* memset() functions. More research needs to be done, perhaps with
* MEMSET_LOOP_LIMIT tests in configure.
*/
#define MemSet(start, val, len) \
do \
{ \
/* must be void* because we don't know if it is integer aligned yet */ \
void *_vstart = (void *) (start); \
int _val = (val); \
Size _len = (len); \
\
printf("_vstart: %lu\n",(uintptr_t) _vstart); \
if ((((uintptr_t) _vstart) & LONG_ALIGN_MASK) == 0 && \
(_len & LONG_ALIGN_MASK) == 0 && \
_val == 0 && \
_len <= MEMSET_LOOP_LIMIT && \
/* \
* If MEMSET_LOOP_LIMIT == 0, optimizer should find \
* the whole "if" false at compile time. \
*/ \
MEMSET_LOOP_LIMIT != 0) \
{ \
long *_start = (long *) _vstart; \
long *_stop = (long *) ((char *) _start _len); \
while (_start < _stop) \
*_start = 0; \
printf("non-standard MemSet invoked\n"); \
} \
else { \
memset(_vstart, _val, _len); \
printf("standard memset invoked\n"); \
} \
} while (0)
#define TEST "test"
memset_pg.c
/*
gcc -Wall -Werror memset_pg.c && ./a.out
*/
#include "memset_pg.h"
#include<stdio.h>
#include<inttypes.h>
#include<assert.h>
int main(void)
{
printf("LONG_ALIGN_MASK:%ld\n",LONG_ALIGN_MASK);
// char str[] = "beautiful earth";
char str[] = "earth567";
printf("strlen=%ld\n",strlen(str));
MemSet(str,0,strlen(str));
printf("via MemSet: str return |%s|\n",str);
printf("str pointer:%ld\n", (uintptr_t)str);
return 0;
}
I am not sure this part ((uintptr_t) _vstart) & LONG_ALIGN_MASK
mean. It means at least the pointer cast to unsign long ending 3 bit should be 000. But I don't know the pattern mean.
typedef struct POD_OnlyStruct{
int a;
int b;
char d;
}POD_OnlyStruct;
POD_OnlyStruct t;
MemSet(&t,0, sizeof t);
the above will not invoke non-standard memset.
However, the following will invoke the non-standard memset.
typedef struct POD_OnlyStruct{
int a;
int b;
int c;
char d;
}POD_OnlyStruct;
POD_OnlyStruct t;
MemSet(&t,0, sizeof t);
(_len & LONG_ALIGN_MASK) == 0
means that the _len is power of 8.
In long *_stop = (long *) ((char *) _start _len);
I am not sure the usage of (char *)
.
CodePudding user response:
I am not sure this part ((uintptr_t) _vstart) & LONG_ALIGN_MASK mean.
_vstart Is a void pointer. By casting it to a uintptr_t it becomes a number we can work with, this suppresses an error for the next operation. By doing the & LONG_ALIGN_MASK
we check if this pointer is aligned to some boundry. According to the rest of your post, we check if the last three digits are zero.
The guiding text tells you why to do it. To me (purely opinion here) it needs to have a massive advantage over the memset in the libraries to be worth it, because the code is hard to read.
CodePudding user response:
I am not sure this part ((uintptr_t) _vstart) & LONG_ALIGN_MASK mean.
This is to check whether the start address has the same alignment as a long
, because if it is not, then the expression (long *) _vstart
has undefined behaviour.
Note that nowadays compilers know that memset()
clears memory, and will actually inline it if they see you are only setting a small amount of memory. So this MemSet()
macro is completely unnecessary. In fact, some compilers might even see that the while
-loop in that code is equivalent to a memset()
, and replace it with a function call if they think that is more efficient (note that compilers can be told to optimize for size over performance).
In long
*_stop = (long *) ((char *) _start _len);
I am not sure the usage of(char *)
.
This is because _start
is a pointer to long
. If you add _len
to that, it would advance it _len
times the size of long
. To make sure it just adds _len
bytes, you need to cast it to char *
first. Also remember that ptr offset
is equivalent to &ptr[offset]
.