I have fixed-length arrays like this:
char x[10] = "abcdefghij"; // could also contain non-printable char
How to efficiently test if this array (could be a null-terminated string or just an array of binary data) is present or not in a fixed set of 10,000 arrays of the same length?
In Python, I would use a set/hashtable/dict and not a list (because of very fast O(1) lookup):
s = "abcdefghij"
S = {"foo1234567", "bar1234567", ..., "baz9876543"}
print(s in S) # True or False
How to do the equivalent in C? (not C )
Note: the linked question How to check if a string is in an array of strings in C? is about a naive way to do it, with no performance requirement (they loop over all strings and use strcmp
). Here it's different since there are 10k arrays, one needs to use another method for performance (a hashtable maybe?).
CodePudding user response:
I think the following binary search works (thanks to @WeatherVane) if the array of arrays is sorted, then the complexity is O(log n). We can use memcmp
for the comparison.
int binarysearch(unsigned char *T, unsigned char *t, int num, int size) // https://en.wikipedia.org/wiki/Binary_search_algorithm#Algorithm
{
int a = 0;
int b = num-1;
int m, res;
while (a <= b) {
m = (a b) / 2;
res = memcmp(&T[0] m*size, &t[0], size);
if (res < 0) {
a = m 1;
} else if (res > 0) {
b = m - 1;
} else {
return 1;
}
}
return 0;
}
int main()
{
unsigned char T[][4] = {
{ 0x00, 0x6e, 0xab, 0xcd },
{ 0xea, 0x6e, 0xab, 0xcd },
{ 0xeb, 0x6e, 0xab, 0xcd },
{ 0xec, 0x6e, 0xab, 0xcd },
{ 0xed, 0x6e, 0xab, 0xcd },
{ 0xef, 0x6e, 0xab, 0xcd },
{ 0xfa, 0x6e, 0xab, 0xcd },
{ 0xfb, 0x6e, 0xab, 0xcd },
{ 0xff, 0x6e, 0xab, 0xcd }
};
unsigned char t1[4] = { 0x33, 0x6e, 0xab, 0xcd };
unsigned char t2[4] = { 0xfb, 0x6e, 0xab, 0xcd };
printf("%s\n", binarysearch(T[0], t1, 9, 4) ? "found" : "not found");
printf("%s\n", binarysearch(T[0], t2, 9, 4) ? "found" : "not found");
}
Result:
not found
found