So I need to turn the first 5 letters of a word into an unsigned int that keeps their relative value in the alphabet, such as that "fabulous" becomes 5 0 1 20 11 or 5012011 to then be able to use it as a key in a trie.
right now i have:
unsigned int hash(const char *word)
{
unsigned int output = 0;
for(int i = 0; i < 5; i )
{
int multiplier = 10000;
if (i > 0)
{
multiplier = 10000 / (i * pow(10.0, i-1) / i);
}
output = (toupper(word[i]) - 'A') * multiplier;
}
return output;
}
now it outputs 50311 which is the correct sum of all the ints * their respective place (the 5 * 10,000 for being in the first place of 5 and so on) but any number over 10 "bleeds through" the previous value, so 20 which is the U bleeds through and affects the b, and the 11 affects the last digit of the 20, but the same it would happen with every letter that happens to be after the J.
I also thought of adding each value to an array so output[0] = 5 and so on, but then i need to convert the array into an unsigned int to return it and have no clue how to go about doing that.
CodePudding user response:
Make each character use 2 digits in the result, then there won't be any overlap between each letter's digits. Do this by using powers of 100 rather than 10. The result for your example will be 500012011
.
#define LIMIT 5
unsigned int hash(const char *word)
{
unsigned int output = 0;
size_t len = strlen(word);
int max = len < LIMIT ? len : LIMIT;
unsigned int multiplier = 100000000;
for(int i = 0; i < max; i )
{
output = (toupper(word[i]) - 'A') * multiplier;
multiplier /= 100;
}
return output;
}
CodePudding user response:
There is a mistake with multiplier
Try this one:
unsigned int hash(const char *word)
{
unsigned int output = 0;
for(int i = 0; i < 5; i )
{
int charVal = toupper(word[i]) - 'A';
if (charVal > 9)
{
output = output * 100 charVal;
}
else
{
output = output * 10 charVal;
}
}
return output;
}
CodePudding user response:
OP: "...but then i need to convert the array into an unsigned int to return it and have no clue how to go about doing that."
It's not hard. With some bit masking (low order 5 bits) and shifting (20, 15, 10, 5, 0), one can quickly assemble an integer value. 5x5=25 bits, so 'signed/unsigned' in 32 bits is irrelevant.
uint32_t hash( const char *word ) {
const int LTRS_USED = 5;
uint32_t val = 0;
int shft = LTRS_USED * 5;
for( size_t c = 0; c < LTRS_USED && word[c]; c )
val |= (word[c] & 037) << (shft -= 5); // see disclaimer below
return val;
}
int main() {
const char *words[] = {
"The", "quick", "brown", "fox", "jumps", "jumped", "over", "a", "lazy", "dog"
};
const size_t nw = sizeof words/sizeof words[0];
for( size_t i = 0; i < nw; i )
printf( "d %s\n", hash( words[i] ), words[i] );
return 0;
}
Notice that the values below (particularly "jumps/jumped") show that the integer values indicate achieving a "sorted" hash sequence. (If 'e' was mapped to 17, 's' would be 31. It works.)
21238784 The
18523243 quick
2703086 brown
6807552 fox
11187731 jumps
11187717 jumped
16455232 over
1048576 a
12643104 lazy
4692992 dog
One could experiment with using 4 letters or 6 to see how that affects the speed and volume requirements.
**(disclaimer: This answer is predicated that all characters are 7 bit ASCII. Other character sets will require other solutions.)