Home > Software design >  Efficient method to find the index of 4-char string among all possible 4-char strings with given let
Efficient method to find the index of 4-char string among all possible 4-char strings with given let

Time:02-26

This:

A = b'abcdefghijklmnopqrstuvwxyz0123456789!'
n = len(A)

def f(s):
    t = 0
    for i, c in enumerate(s):
        t  = A.index(c) * n ** (3 - i)
    return t

print(f(b'aaaa'))  # first possible 4-char string, should be 0
print(f(b'aaab'))  # 2nd possible 4-char string, should be 1
print(f(b'!!!!'))  # last possible 4-char string, should be 37^4 - 1

works to find the index of a 4-char string among all possible 4-char strings made with an alphabet A, but it does not seem efficient because of the many A.index() calls.

How to speed-efficiently find the index of a 4-char string among all possible 4-char strings made with an alphabet A?

CodePudding user response:

I'm not sure why you're concerned with the speed efficiency of your code. The A.index() is only executed once for each character, so 4 times in your example. Of course you could make that more speed efficiently by using a dictionary. So, first creating a dictionary like:

alphabet_index_lookup = {'a': 0, 'b': 1, ..., '!': 37} 

and then use that to lookup the index of the character. This is significantly faster, because python internally uses a hash-map to find the key instead of just iterating through the list. More background & comparison: https://towardsdatascience.com/faster-lookups-in-python-1d7503e9cd38

But like previously mentioned: Unless you're applying f() for really long strings, it won't make a significant difference.

CodePudding user response:

I mean, the only way I can think of is using a stupid branchless formula to calculate the index as-is, but I have nothing else to offer you.

A = b'abcdefghijklmnopqrstuvwxyz0123456789!'
n = len(A)

def f(s,A):
    t = 0
    for i, c in enumerate(s):
        t  = A.index(c) * n ** (3 - i)
    return t

def g(s,A):
    t = 0
    for i, c in enumerate(s):
        t  = (c - 97   (c<97)*75   (c<48)*25) * n ** (3 - i)
    return t


print(f(b'aaaa',A),g(b'aaaa',A))  # first possible 4-char string, should be 0
print(f(b'aaab',A),g(b'aaab',A))  # 2nd possible 4-char string, should be 1
print(f(b'!!!!',A),g(b'!!!!',A))  # last possible 4-char string, should be 37^4 - 1

import timeit
ft = timeit.timeit(lambda: 'f(b"aop!")',number=1000000)
gt = timeit.timeit(lambda: 'g(b"aop!")',number=1000000)
print(f'f: {ft}, g: {gt}, factor: {ft/gt}')

Output:

0 0
1 1
1874160 1874160
f: 0.12712620000820607, g: 0.06313459994271398, factor: 2.0135741752312635

Edit: I tried one other thing:

def h(s):
    arr = np.array(list(s))
    return sum( (arr - 97   (arr<97)*75   (arr<48)*25) * n ** (3 - np.array([0,1,2,3])) )

Output:

0 0 0
1 1 1
1874160 1874160 1874160
f: 0.10835059999953955, g: 0.07444929995108396, h: 0.06341919989790767,
factor f/g: 1.4553608975602195, factor f/h: 1.7084826073801391
  • Related