Home > Back-end >  How to sort non-numeric strings by converting them to integers? Is there a way to convert strings to
How to sort non-numeric strings by converting them to integers? Is there a way to convert strings to

Time:03-17

I am trying to convert strings to integers and sort them based on the integer value. These values should be unique to the string, no other string should be able to produce the same value. And if a string1 is bigger than string2, its integer value should be greater. Ex: since "orange" > "apple", "orange" should have a greater integer value. How can I do this?

I know there are an infinite number of possibilities between just 'a' and 'b' but I am not trying to fit every single possibility into a number. I am just trying to possibly sort, let say 1 million values, not an infinite amount.

I was able to get the values to be unique using the following:

long int order = 0;
for (auto letter : word)
    order = order * 26   letter - 'a'   1;
return order;

but this obviously does not work since the value for "apple" will be greater than the value for "z".

This is not a homework assignment or a puzzle, this is something I thought of myself. Your help is appreciated, thank you!

CodePudding user response:

You are almost there ... just a minor tweaks are needed:

  1. you are multiplying by 26

    however you have letters (a..z) and empty space so you should multiply by 27 instead !!!

  2. Add zeropading

    in order to make starting letter the most significant digit you should zeropad/align the strings to common length... if you are using 32bit integers then max size of string is:

    floor(log27(2^32)) = 6
    floor(32/log2(27)) = 6
    

Here small example:

int lexhash(char *s)
    {
    int i,h;
    for (h=0,i=0;i<6;i  ) // process string
        {
        if (s[i]==0) break;
        h*=27;
        h =s[i]-'a' 1;
        }
    for (;i<6;i  ) h*=27; // zeropad missing letters
    return h;
    }

returning these:

  14348907      a
  28697814      b
  43046721      c
 373071582      z
  15470838    abc
 358171551    xyz
  23175774  apple
 224829626 orange

ordered by hash:

  14348907      a
  15470838    abc
  23175774  apple
  28697814      b
  43046721      c
 224829626 orange
 358171551    xyz
 373071582      z

This will handle all lowercase a..z strings up to 6 characters length which is:

26^6   26^5  26^4   26^3   26^2   26^1 = 321272406 possibilities

For more just use bigger bitwidth for the hash. Do not forget to use unsigned type if you use the highest bit of it too (not the case for 32bit)

CodePudding user response:

You can use position of char:

std::string s("apple");
int result = 0;
for (size_t i = 0; i < s.size();   i)
    result  = (s[i] - 'a') * static_cast<int>(i   1);
return result;

By the way, you are trying to get something very similar to hash function.

  • Related