Home > Back-end >  How to calculate the index of a string value in a sorted list that contains all possible combination
How to calculate the index of a string value in a sorted list that contains all possible combination

Time:11-27

Given a lexicographically sorted list that contains all possible combinations from a set of characters up to a maximum length, how could the index of a particular string value be calculated?

For example, if the character set is '1', 'a', and 'b' and the maximum string length is 3, the sorted list of all possible combinations would be:

 0: ""        10: "1b"      20: "aa1"     30: "b1a"
 1: "1"       11: "1b1"     21: "aaa"     31: "b1b"
 2: "11"      12: "1ba"     22: "aab"     32: "ba"
 3: "111"     13: "1bb"     23: "ab"      33: "ba1"
 4: "11a"     14: "a"       24: "ab1"     34: "baa"
 5: "11b"     15: "a1"      25: "aba"     35: "bab"
 6: "1a"      16: "a11"     26: "abb"     36: "bb"
 7: "1a1"     17: "a1a"     27: "b"       37: "bb1"
 8: "1aa"     18: "a1b"     28: "b1"      38: "bba"
 9: "1ab"     19: "aa"      29: "b11"     39: "bbb"

How could the zero-based index of "a", "ab", or "ba1" be calculated in an efficient way?

CodePudding user response:

First you need a way to calculate how many strings there are in such a list.

Let count(n) be the number of strings of length <= n. There are a lot of ways to calculate that:

  • recursively: count(n) = 1 3 * count(n-1)
  • sum of lengths: count(n) = sum_from_0_to_n(3n)
  • formula: count(n) = (3n 1-1) / 2

Then, using count(n), it's easy to find the index of any string recursively.

Let index(s,n) be the index of s in the list of strings of length <= n, and let s.tail be s with the first character removed:

  • if s == "", then index(s,n) = 0
  • if s == "1..." then index(s,n) = 1 index(s.tail, n-1)
  • if s == "a..." then index(s,n) = 1 count(n-1) index(s.tail, n-1)
  • if s == "b..." then index(s,n) = 1 2 * count(n-1) index(s.tail, n-1)

Look at the recursive formula for count(n), and notice that these choices divide count(n) into 4 parts.

  • Related