Get index by permutation and permutation by index without making a list in python-CodePudding

So, i want to create program, that will fastly return index of string in permutations list:

a
b
c
aa
ab
ac
ba
bb
bc
ca
cb
cc
aaa
aab
aac
aba
abb
abc

I use something like this to generate this list:

from itertools import product

charset = 'abc'
max_len = 3

for i in range(1, max_len   1):
    for combo in list(product(charset, repeat=i)):
        print(''.join(combo))

So, it will be 6 for 'ac', 23 for 'bab', 'caa' for 31, etc.

I copied some code from other questions, but it doesn't really works fine

import math


class Page:
    @staticmethod
    def from_index(index: int, chars: str) -> str:
        perm = str()
        for i in range(math.ceil(index / len(chars)) - 1, -1, -1):
            perm  = chars[int((index / (len(chars) ** i)) % len(chars))]

        return perm


class Index:
    @staticmethod
    def from_page(perm: str, chars: str) -> int:
        ids = [chars.find(x)   1 for x in perm]

        base = (len(chars) ** len(perm)) / len(perm)

        index = 0

        for i in range(len(perm)):
            index  = base * ids[i]
            base /= len(perm)

        return int(index)

p.s. I dont want to generate whole list, but use algorithm to get values. It should work fast :)

CodePudding user response：

You haven't really properly defined what you want in you question, but from your examples it seems like you want to find the index of a string in an ordered list of cartesian products of lengths from 1 to max_len, for a given charset.

You can do this without building the list by counting the contributions of each character from left to right. Note that for a given length x, there are len(charset)^x elements. You can use this to progressively build up the answer.

So, it will be 6 for 'ac', 23 for 'bab', 'caa' for 31, etc.

First of all, both the length of the charset and the max_len are 3, so we have 3^1 possibilities for length 1 elements, 3^2 for length 2 elements and 3^3 for length 3 elements.

ac: since we have a length 2 string, we already skipped all length 1 strings, of which there are 3.

A c on the third position means we skipped 2, so we are the the 6th.
bab: first of all, length 3 means we skipped all length 1s and 2s so we count from 3 3^2=12

(leftmost) b means we have skipped all that start with a and continue with any of the 3^2=9 length 2 elements.

a in the second means we haven't skipped any for the second

b in the last contributes 1

So we are at all of the above 1: 12 9 1 1=23.

Similarly for caa.

And so on. Python code:

def find_index(string, max_len, charset):
    str_len = len(string)
    charset_len = len(charset)
    index = 1   sum(charset_len ** i for i in range(1, str_len))
    for i, c in enumerate(string):
        prev_pow = charset_len**(str_len - i - 1)
        contrib = charset.find(c) * prev_pow
        index  = contrib
    return index
        
        
print(find_index('caa', 3, 'abc'))

You can probably further optimize this by not calling ** at each step, but it's quite fast as it is.

Similar reasoning should allow you to go from an index to the string at that position.

CodePudding user response：

I found answer for my question. Numeral system with base of charset len is something that I wanted to create, it works in such way.

from functools import lru_cache
from modules.config import charset


@lru_cache
def encode(n):
    try:
        return charset[n]
    except IndexError:
        raise Exception(f"Cannot encode {n}")


@lru_cache
def decode(s):
    try:
        return charset.index(s)
    except ValueError:
        raise Exception(f"Cannot decode {s}")


@lru_cache
def dec_to_base(dec=0, base=16):
    if dec < base:
        return encode(dec)
    else:
        return dec_to_base(dec // base, base)   encode(dec % base)


@lru_cache
def base_to_dec(s, base=16, rec_pow=0):
    if s == str():
        return 0
    else:
        return decode(s[-1]) * (base ** rec_pow)   base_to_dec(s[0:-1], base, rec_pow   1)


class Page:
    @staticmethod
    def from_index(index: int) -> str:
        return dec_to_base(index, len(charset))


class Index:
    @staticmethod
    def from_page(page: str) -> int:
        return base_to_dec(page, len(charset))

config.py

import json

config = json.load(open('config.json'))

charset = ''.join(sorted(list(set(' '   config['charset']))))

It returns ' abc', with leading space (space is zero)

print(Index.from_page('abcabcabc'))  # 112347
print(Page.from_index(112347))  # abcabcabc

It works in way I really needed!