template <char terminator = '\0'>
size_t strlen(const char *str)
{
const char *char_ptr;
const unsigned long int *longword_ptr;
unsigned long int longword, magic_bits, himagic, lomagic;
for (char_ptr = str; ((unsigned long int) char_ptr
& (sizeof (longword) - 1)) != 0; char_ptr)
if (*char_ptr == '\0')
return char_ptr - str;
longword_ptr = (unsigned long int *) char_ptr;
himagic = 0x80808080L;
lomagic = 0x01010101L;
for (;;)
{
longword = *longword_ptr ;
if (((longword - lomagic) & himagic) != 0)
{
const char *cp = (const char *) (longword_ptr - 1);
if (cp[0] == 0)
return cp - str;
if (cp[1] == 0)
return cp - str 1;
if (cp[2] == 0)
return cp - str 2;
if (cp[3] == 0)
return cp - str 3;
}
}
}
The above is glibc strlen()
code. It relies on trick to Determine if a word has a zero byte to make it fast.
However, I wish to make the function work with any terminating character, not just '\0'
, using template. Is it possible to do something similar ?
CodePudding user response:
Use std::memchr
to take advantage of libc's hand-written asm
It returns a pointer to the found byte, so you can get the length by subtracting. It returns NULL
on not-found, but you said you can assume there will be a match, so we don't need to check except as a debug assert.
Even better, use rawmemchr
if you can assume GNU functions are available, so you don't even have to pass a length.
#include <cstring>
size_t lenx(const char *p, int c, size_t len)
{
const void *match = std::memchr(p, c, len); // old C functions take char in an int
return static_cast<const char*>(match) - p;
}
Any decent libc implementation for a modern mainstream CPU will have a fast memchr
implementation that checks multiple bytes at once, often hand-written in asm. Very similar to an strlen
implementation, but with length-based loop exit condition in the unrolled loop, not just the match-finding loop exit condition.
memchr
is somewhat cheaper than strchr
, which has to check every byte for being a potential 0
; an amount of work that doesn't go down with unrolling and SIMD. If data isn't hot in L1 cache, a good strchr can typically still keep up with available bandwidth, though, on most CPUs for most ISAs. Checking for 0
s is also a correctness problem for arrays that contain a 0
byte before the byte you're looking for.
If available, it will even use SIMD instructions to check 16 or 32 bytes at once. A pure C bithack (with strict-aliasing UB) like the one you found is only used in portable fallback code in real C libraries (Why does glibc's strlen need to be so complicated to run quickly? explains this and has some links to glibc's asm implementations), or on targets where it compiles to asm as good as could be written by hand (e.g. MIPS for glibc). (But being wrapped up in a library function, the strict-aliasing UB is dealt with by some means, perhaps as simple as just not being able to inline into other code that accesses that data a different way. If you wanted to do it yourself, you'd want a typedef with something like GNU C __attribute__((may_alias))
. See the link earlier in this paragraph.)
You certainly don't want a bithack that's only going to check 4 bytes at a time, especially if unsigned long
is an 8-byte type on a 64-bit CPU!
If you don't know the buffer length, use len = -1
Use rawmemchr
if available, otherwise use memchr(ptr, c, -1)
.
That's equivalent to passing SIZE_MAX
.
It's guaranteed not to read past the match, or at least to behave as if it didn't, i.e. not faulting. So not reading into the next page just like an optimized strlen
, and probably for performance reasons not reading into the next cache line. (At least since C 17 / C11, according to cppreference, but real implementations have almost certainly been safe to use this way for longer, at least for performance reasons.)
The ISO C standard itself defers to the C standard for <cstring>
functions; I don't know exactly why cppreference says C 17 and C11, or what changed in the standard at that point. (I also don't know if there are real-world implementations that violate that standard; I'd guess not and that it was more a matter of documenting a guarantee that real implementations were already doing.)
The POSIX man page for memchr
guarantees stopping on a match; I don't know how far back this guarantee goes for POSIX systems.
Implementations shall behave as if they read the memory byte by byte from the beginning of the bytes pointed to by s and stop at the first occurrence of c (if it is found in the initial n bytes).
Without a guarantee like this, it's hypothetically possible for an implementation to just use unaligned loads starting with the address you pass it, as long as it's far enough from the ptr[size-1]
end of the buffer you told it about. That's unlikely for performance reasons, though.
rawmemchr()
If you're on a GNU system, glibc has rawmemchr
which assumes there will be a match, not taking a size limit. So it can loop exactly like strlen
does, not having a 2nd exit condition based on either length or checking every byte for a 0
as well as the given character.
Fun fact: AArch64 glibc implements it as memchr(ptr, c, -1)
, or as strlen if the character happens to be 0
. On other ISAs, it might actually duplicate the memchr
code but without checking against the end of a buffer.