Home > Software design >  Algorithmic problem of wordlist generation
Algorithmic problem of wordlist generation

Time:08-06

I want to generate x number of variants of an address, but with a list of predefined symbols.

adress = "RUE JEAN ARGENTIN"
a_variants = ["À","Á","Â","Ä","Ã","Å"]
e_variants = ["É","È","Ê","Ë"]
i_variants = ["Ì","Í","Î","Ï"]
u_variants = ["Ù","Ú","Û","Ü"]
r_variants = ["Ŕ","Ŗ"]
o_variants = ["Ò","Ó","Ô","Ö","Õ"]
s_variants = ["Ś","Š","Ş"]
n_variants = ["Ń","Ň"]
d_variants = ["D","Đ"]

Example of output

# RUE JEAN ARGENTINS
# ŔUE JEAN ARGENTINS
# ŖUE JEAN ARGENTINS
# RÙE JEAN ARGENTINS
# RÚE JEAN ARGENTINS
# RÛE JEAN ARGENTINS
# RÙE JEAN ARGENTINS
# RÜE JEAN ARGENTINS
# ....
# ŖUE JEAN ARGENTÌNS
# ....
# ....
# ŖÛÉ JÉÀŇ ÀŖGÈNTÏŇŞ
# ....

I don't especially want a code answer in python or anything, but mostly an algorithmic idea to achieve this. Or even a mathematical calculation to find the number of possible variations.

Thank you very much

CodePudding user response:

It should be something like:

  • first R has 3 versions (r_variants original R)
  • next U has 5 versions (u_variants original U) etc.

and you have to multipicate all values

(number of items in r_variants   1) * (number of items in u_variants   1) * .... 

If you count repeated chars in text then you can reduce it to

((number of items in a_variants   1) * number of chars A in text) * ((number of items in e_variants   1) * number of chars E in text) * .... 

But in code I will use first method:

address = "RUE JEAN ARGENTIN"

variation = {
'A': ["À","Á","Â","Ä","Ã","Å"],
'E': ["É","È","Ê","Ë"],
'I': ["Ì","Í","Î","Ï"],
'U': ["Ù","Ú","Û","Ü"],
'R': ["Ŕ","Ŗ"],
'O': ["Ò","Ó","Ô","Ö","Õ"],
'S': ["Ś","Š","Ş"],
'N': ["Ń","Ň"],
'D': ["D","Đ"],
}

result = 1

for char in address:
    if char in variation:
        result *= (len(variation[char])   1)
        
print(result)        

Result:

37209375   # 37 209 375

EDIT:

You can substract 1 to skip original version "RUE JEAN ARGENTIN"

37209374   # 37 209 374

EDIT:

If you use shorter text and smaller number of variation then you can easly generate all version on paper (or display on screen) and count them - and check if calculation is correct.

For example for RN it gives 8 variation (9 versions - original)

You can use itertools.product() to generate all versions

import itertools

for number, chars in enumerate(itertools.product(['R', "Ŕ","Ŗ"], ['N', "Ń","Ň"]), 1):
    print(f'{number} | {"".join(chars)}')

Result:

1 | RN
2 | RŃ
3 | RŇ
4 | ŔN
5 | ŔŃ
6 | ŔŇ
7 | ŖN
8 | ŖŃ
9 | ŖŇ

CodePudding user response:

An easy way to implement it is using recursion. However, the number of combinations will be gigantic, so I caution you against using this in any practical situation with long words or lots of combinations of characters. Something like this, in pseudocode:

results = [];

function getVariations(str, depth):
    if depth == str.length:
        results.add(str);
        return;

    getVariations(str, depth 1);     // For original letter

    c <- str[depth];
    for v in variations[c]:
        str[depth] <- v;
        getVariations(str, depth 1); // For variations of letter

getVariations(yourString, 0);

  • Related