So, i want to create program, that will fastly return index of string in permutations list:
a
b
c
aa
ab
ac
ba
bb
bc
ca
cb
cc
aaa
aab
aac
aba
abb
abc
I use something like this to generate this list:
from itertools import product
charset = 'abc'
max_len = 3
for i in range(1, max_len 1):
for combo in list(product(charset, repeat=i)):
print(''.join(combo))
So, it will be 6 for 'ac', 23 for 'bab', 'caa' for 31, etc.
I copied some code from other questions, but it doesn't really works fine
import math
class Page:
@staticmethod
def from_index(index: int, chars: str) -> str:
perm = str()
for i in range(math.ceil(index / len(chars)) - 1, -1, -1):
perm = chars[int((index / (len(chars) ** i)) % len(chars))]
return perm
class Index:
@staticmethod
def from_page(perm: str, chars: str) -> int:
ids = [chars.find(x) 1 for x in perm]
base = (len(chars) ** len(perm)) / len(perm)
index = 0
for i in range(len(perm)):
index = base * ids[i]
base /= len(perm)
return int(index)
p.s. I dont want to generate whole list, but use algorithm to get values. It should work fast :)
CodePudding user response:
You haven't really properly defined what you want in you question, but from your examples it seems like you want to find the index of a string in an ordered list of cartesian products of lengths from 1 to max_len
, for a given charset.
You can do this without building the list by counting the contributions of each character from left to right. Note that for a given length x
, there are len(charset)^x
elements. You can use this to progressively build up the answer.
So, it will be 6 for 'ac', 23 for 'bab', 'caa' for 31, etc.
First of all, both the length of the charset and the max_len
are 3
, so we have 3^1
possibilities for length 1 elements, 3^2
for length 2 elements and 3^3
for length 3 elements.
ac
: since we have a length2
string, we already skipped all length1
strings, of which there are3
.A
c
on the third position means we skipped2
, so we are the the6
th.bab
: first of all, length 3 means we skipped all length 1s and 2s so we count from3 3^2=12
(leftmost)
b
means we have skipped all that start witha
and continue with any of the3^2=9
length2
elements.a
in the second means we haven't skipped any for the secondb
in the last contributes1
So we are at all of the above
1
:12 9 1 1=23
.
Similarly for caa
.
And so on. Python code:
def find_index(string, max_len, charset):
str_len = len(string)
charset_len = len(charset)
index = 1 sum(charset_len ** i for i in range(1, str_len))
for i, c in enumerate(string):
prev_pow = charset_len**(str_len - i - 1)
contrib = charset.find(c) * prev_pow
index = contrib
return index
print(find_index('caa', 3, 'abc'))
You can probably further optimize this by not calling **
at each step, but it's quite fast as it is.
Similar reasoning should allow you to go from an index to the string at that position.
CodePudding user response:
I found answer for my question. Numeral system with base of charset len is something that I wanted to create, it works in such way.
from functools import lru_cache
from modules.config import charset
@lru_cache
def encode(n):
try:
return charset[n]
except IndexError:
raise Exception(f"Cannot encode {n}")
@lru_cache
def decode(s):
try:
return charset.index(s)
except ValueError:
raise Exception(f"Cannot decode {s}")
@lru_cache
def dec_to_base(dec=0, base=16):
if dec < base:
return encode(dec)
else:
return dec_to_base(dec // base, base) encode(dec % base)
@lru_cache
def base_to_dec(s, base=16, rec_pow=0):
if s == str():
return 0
else:
return decode(s[-1]) * (base ** rec_pow) base_to_dec(s[0:-1], base, rec_pow 1)
class Page:
@staticmethod
def from_index(index: int) -> str:
return dec_to_base(index, len(charset))
class Index:
@staticmethod
def from_page(page: str) -> int:
return base_to_dec(page, len(charset))
config.py
import json
config = json.load(open('config.json'))
charset = ''.join(sorted(list(set(' ' config['charset']))))
It returns ' abc'
, with leading space (space is zero)
print(Index.from_page('abcabcabc')) # 112347
print(Page.from_index(112347)) # abcabcabc
It works in way I really needed!