Home > Software design >  What is a good way to generate all strings of length n over a given alphabet within a range in dicti
What is a good way to generate all strings of length n over a given alphabet within a range in dicti

Time:12-05

I want to write a generator s_generator(alphabet, length, start_s, end_s) that generates strings of length n over a given alphabet in dictionary order starting with start_s and ending at end_s.

For example, s_generator('ab', 4, 'aaaa', 'bbbb') generates ['aaaa', 'aaab', 'aaba', 'aabb', 'abaa', 'abab', 'abba', 'abbb', 'baaa', 'baab', 'baba', 'babb', 'bbaa', 'bbab', 'bbba', 'bbbb']. And s_generator('ab', 4, 'abaa', 'abaa') generates ['abaa', 'abab', 'abba', 'abbb', 'baaa']

What is a good way to implement it?

I thought about assigning a number to each character in alphabet, treating the string as a base-n number (n is size of alphabet) and using addition to get the next number, and then convert the number back to string. For example, 'abab' is [0, 1, 0, 1] and the next number is [0, 1, 1, 0], which is 'abba'. This method seems complicated. Is there a simpler solution?

CodePudding user response:

use itertools and comprehension list

from itertools import product

def s_generator(alphabet, length, start_s, end_s):
    products = product(alphabet, repeat=length)
    return [''.join(x) for x in products if ''.join(x) >= start_s and ''.join(x) <= end_s]


print(s_generator('ab', 4, 'aaaa', 'bbbb'))

CodePudding user response:

It seems the counting method I mentioned in my question is the most efficient and cleanest solution.

def str_generator(start_s, end_s):
    # assuming start_s < end_s and are valid strings
    start_nums, end_nums = str2nums(start_s), str2nums(end_s)
    nums = start_nums
    while nums <= end_nums:
        yield nums2str(nums)

        # increment by 1
        i = len(nums) - 1
        while i >= 0:
            nums[i] = nums[i]   1
            if nums[i] < SIZE_OF_ALPHABET:
                break
            nums[i] = 0
            i -= 1
        if i < 0:
            return

str2nums(s) and nums2str(nums) are just functions that handles the mapping between strings and numbers. They can be returning hash table entries. Or in my case of using a-zA-Z as my alphabet, simply shifting the ASCII values will do:

# ASCII value
# A-Z: 65-90, a-z: 97-122

def str2nums(s):
    nums = []
    for c in s:
        if c.islower():  # 0-25 is a-z
            nums.append(ord(c) - ord('a'))
        else:  # 26-51 is A-Z
            nums.append(ord(c)   26 - ord('A'))
    return nums

def nums2str(nums):
    str_builder = []
    for n in nums:
        if n < 26:  # 0-25 is a-z
            str_builder.append(chr(n   ord('a')))
        else:  # 26-51 is A-Z
            str_builder.append(chr(n - 26   ord('A')))
    return ''.join(str_builder)
  • Related