Filter a list of strings by a char in same position-CodePudding

I am trying to make a simple function that gets three inputs: a list of words, list of guessed letters and a pattern. The pattern is a word with some letters hidden with an underscore. (for example the word apple and the pattern '_pp_e') For some context it's a part of the game hangman where you try to guess a word and this function gives a hint. I want to make this function to return a filtered list of words from the input that does not contain any letters from the list of guessed letters and the filtered words contain the same letters and their position as with the given pattern. I tried making this work with three loops.

First loop that filters all words by the same length as the pattern.
Second loop that checks for similarity between the pattern and the given word. If the not filtered word does contain the letter but not in the same position I filter it out.
Final loop checks the filtered word that it does not contain any letters from the given guessed list.

I tried making it work with not a lot of success, I would love for help. Also any tips for making the code shorter (without using third party libraries) will be a appreciated very much. Thanks in advance!

Example: pattern: "d _ _ _ _ a _ _ _ _" guessed word list ['b','c'] and word list contain all the words in english. output list: ['delegating', 'derogation', 'dishwasher']

this is the code for more context:

def filter_words_list(words, pattern, wrong_guess_lst):
lst_return = []
lst_return_2 = []
lst_return_3 = []
new_word = ''
for i in range(len(words)):
    if len(words[i]) == len(pattern):
        lst_return.append(words[i])
pattern = list(pattern)
for i in range(len(lst_return)):
    count = 0
    word_to_check = list(lst_return[i])
    for j in range(len(pattern)):
        if pattern[j] == word_to_check[j] or (pattern[j] == '_' and
                                              (not (word_to_check[j] in
                                                    pattern))):
            count  = 1
    if count == len(pattern):
        lst_return_2.append(new_word.join(word_to_check))
for i in range(len(lst_return_2)):
    word_to_check = lst_return_2[i]
    for j in range(len(wrong_guess_lst)):
        if word_to_check.find(wrong_guess_lst[j]) == -1:
            lst_return_3.append(word_to_check)

return lst_return_3

CodePudding user response：

Probably not the most efficient, but this should work:

def filter_words_list(words, pattern, wrong_guess_lst):
    fewer_words = [w for w in words if not any([wgl in w for wgl in wrong_guess_lst])]
    equal_len_words = [w for w in fewer_words if len(w) == len(pattern)]
    pattern_indices = [idl for idl, ltr in enumerate(pattern) if ltr != '_']
    word_indices = [[idl for idl, ltr in enumerate(w) if ((ltr in pattern) and (ltr != '_'))] for w in equal_len_words]
    out = [w for wid, w in zip(word_indices, equal_len_words) if ((wid == pattern_indices) and (w[pid] == pattern[pid] for pid in pattern_indices))]
    return out

The idea is to first remove all words that have letters in your wrong_guess_lst. Then, remove everything which does not have the same length (you could also merge this condition in the first one..). Next, for both pattern and your remaining words, you create a pattern mask, which indicates the positions of non '_' letters. To be a candidate, the masks have to be identical AND the letters in these positions have to be identical as well.

Note, that I replaced a lot of for loops in you code by list comprehension snippets. List comprehension is a very useful construct which helps a lot especially if you don't want to use other libraries.

Edit: I cannot really tell you, where your code went wrong as it was a little too long for me..

CodePudding user response：

The easiest, and likely quite efficient, way to do this would be to translate your pattern into a regular expression, if regular expressions are in your "toolbox". (The re module is in the standard library.)

In a regular expression, . matches any single character. So, we replace all _s with .s and add "^" and "$" to anchor the regular expression to the whole string.

import re

def filter_words(words, pattern, wrong_guesses):
    re_pattern = re.compile("^"   re.escape(pattern).replace("_", ".")   "$")
    
    # get words that 
    #   (a) are the correct length 
    #   (b) aren't in the wrong guesses 
    #   (c) match the pattern
    return [
        word
        for word in words
        if (
            len(word) == len(pattern) and
            word not in wrong_guesses and
            re_pattern.match(word)
        )
    ]

all_words = [
    "cat",
    "dog",
    "mouse",
    "horse",
    "cow",
]

print(filter_words(all_words, "c_t", []))
print(filter_words(all_words, "c__", []))
print(filter_words(all_words, "c__", ["cat"]))

prints out

['cat']
['cat', 'cow']
['cow']

If you don't care for using regexps, you can instead translate the pattern to a dict mapping each defined position to the character that should be found there:

def filter_words_without_regex(words, pattern, wrong_guesses):
    # get a map of the pattern's defined letters to their positions
    letter_map = {i: letter for i, letter in enumerate(pattern) if letter != "_"}
    # get words that
    #   (a) are the correct length
    #   (b) aren't in the wrong guesses
    #   (c) have the correct letters in the correct positions
    return [
        word
        for word in words
        if (
            len(word) == len(pattern) and
            word not in wrong_guesses and
            all(word[i] == ch for i, ch in letter_map.items())
        )
    ]

The result is the same.