More efficient way to iterate through n-nested for loops in Python-CodePudding

I'm currently working on a "dehashing" script that lets user type in an input, together with the hashing method, and the script iterates through a list of characters, builds strings of different lengths, and tries to check if any of the character combinations (of lengths 1-8), hashed, are equal to the input provided by the user.

For example, the user provides the hashed version of 'password', and the algorithm takes all the possibilities, starting from length 1:

Length 1: a, b, c, d, ..., z

Length 2: aa, ab, ac, ..., zz

Length 3: aaa, aab, aac, ..., zzz

and so on, until it reaches Length 8 (including it).

It hashes all the possibilities, one by one, and it checks if they are equal to the user's input. If so, the program outputs the unhashed string and stops the searching.

I firstly thought about using 1 for() loop for length 1, 2 nested for() loops for length 2, and so on, but thought that I might copy and paste too much of the same code, so I Googled for some other options, and I found out that I can use itertools.

This is how I'm generating my n-nested for() loops:

chars = "abcdefghijklmnopqrstuvwxyz"
ranges = []
for i in range(0, length):
    ranges.append(range(0, len(chars)))
for xs in itertools.product(*ranges):
    # build the string here, hash it and check if it maches the user's input

I'm not providing the full implementation, because there's more than just checking (writing into files if something is found, outputting stuff, etc.). The idea is, I realized that this algorithm works pretty well for lengths 1-4. Strings with length 1, 2 or 3 are found in less than a second, while strings with length 4 can also require several minutes.

I also "improved" the searching, using multiprocessing and searching for groups of two lengths per process.

The problem is, the algorithm is still not efficient enough. If I want to search for a string with length 5, for example, I'll have to wait even hours, and I'm pretty sure that's a more efficient way of implementing what I actually did.

Also tested the execution time of n-nested normal for() loops vs this type of itertool implementation, and found out that the for() loops are 2x faster. Shouldn't it have been exactly the reverse?

Do you have any advice on how to improve my algorithm?

CodePudding user response：

You can use chars directly as the iterable for itertools.product. In addition, product accepts an optional argument repeat if you want the product of an iterable with itself. Refer to the documentation.

product generates tuples. To get a string out of a tuple of strings, use ''.join().

from itertools import product

def find_password(hashed, length, chars = "abcdefghijklmnopqrstuvwxyz"):
    for p in product(chars, repeat=length):
        if hash(''.join(p)) == hashed:
            return ''.join(p)
    return None

password = 'aaabc'
print( find_password(hash(password), len(password)) )
# aaabc

Additionally, you could use from string import ascii_lowercase instead of hardcoding your own alphabet:

from string import ascii_lowercase

print(ascii_lowercase)
# abcdefghijklmnopqrstuvwxyz