I am trying to convert strings to integers and sort them based on the integer value. These values should be unique to the string, no other string should be able to produce the same value. And if a string1 is bigger than string2, its integer value should be greater. Ex: since "orange" > "apple", "orange" should have a greater integer value. How can I do this?
I know there are an infinite number of possibilities between just 'a' and 'b' but I am not trying to fit every single possibility into a number. I am just trying to possibly sort, let say 1 million values, not an infinite amount.
I was able to get the values to be unique using the following:
long int order = 0;
for (auto letter : word)
order = order * 26 letter - 'a' 1;
return order;
but this obviously does not work since the value for "apple" will be greater than the value for "z".
This is not a homework assignment or a puzzle, this is something I thought of myself. Your help is appreciated, thank you!
CodePudding user response:
You are almost there ... just a minor tweaks are needed:
you are multiplying by 26
however you have letters
(a..z)
and empty space so you should multiply by 27 instead !!!Add zeropading
in order to make starting letter the most significant digit you should zeropad/align the strings to common length... if you are using 32bit integers then max size of string is:
floor(log27(2^32)) = 6 floor(32/log2(27)) = 6
Here small example:
int lexhash(char *s)
{
int i,h;
for (h=0,i=0;i<6;i ) // process string
{
if (s[i]==0) break;
h*=27;
h =s[i]-'a' 1;
}
for (;i<6;i ) h*=27; // zeropad missing letters
return h;
}
returning these:
14348907 a
28697814 b
43046721 c
373071582 z
15470838 abc
358171551 xyz
23175774 apple
224829626 orange
ordered by hash:
14348907 a
15470838 abc
23175774 apple
28697814 b
43046721 c
224829626 orange
358171551 xyz
373071582 z
This will handle all lowercase a..z
strings up to 6 characters length which is:
26^6 26^5 26^4 26^3 26^2 26^1 = 321272406 possibilities
For more just use bigger bitwidth for the hash. Do not forget to use unsigned type if you use the highest bit of it too (not the case for 32bit)
CodePudding user response:
You can use position of char:
std::string s("apple");
int result = 0;
for (size_t i = 0; i < s.size(); i)
result = (s[i] - 'a') * static_cast<int>(i 1);
return result;
By the way, you are trying to get something very similar to hash function.