Assuming no string less than 4 bytes is ever passed, is there anything wrong with this optimization? And yes it is a significant speedup on the machines I've tested it on when comparing mostly dissimilar strings.
#define STRCMP(a, b) ( (*(int32_t*)a) == (*(int32_t*)b) && strcmp(a, b) == 0)
And assuming strings are no less than 4 bytes, is there a faster way to do this without resorting to assembly, etc?
CodePudding user response:
Casting the address of a char
array to an int *
and dereferencing it is always a strict aliasing violation in addition to possibly violating alignment restrictions.
Example
See UDP checksum calculation not working with newer version of gcc for just one example of the dangers of strict aliasing violations.
Note that C implementations themselves are free to make use of undefined behavior internally. The implementers have knowledge and complete control over the implementation, neither of which someone using someone else's compiler will in general have.
CodePudding user response:
*(int32_t*)a
assumes that a
is 4-byte aligned. That's in general not the case.
CodePudding user response:
is there anything wrong with this optimization?
Alignment
Yes, (int32_t*)a
risks undefined behavior due to a
not meeting int *
alignment.
Inverted meaning
strcmp()
returns 0 on match. STRCMP()
returns 1 on match. Consider alternatives like STREQ()
.
Multiple and inconsistent a
evaluations
Consider STRCMP(s , t)
. s
will get incremented 1 or 2 times.
And assuming strings are no less than 4 bytes, is there a faster way to do this without resorting to assembly, etc?
Test 1 character
Try profiling the below. Might not be faster than OP's UB code, but faster than strcmp()
.
//#define STRCMP(a, b) ( (*(int32_t*)a) == (*(int32_t*)b) && strcmp(a, b) == 0)
#define STREQ(a, b) ( (*(unsigned char *)a) == (*(unsigned char *)b) && strcmp(a, b) == 0)
Step back and look at the larger picture for performance improvements.