Checking if words are within n space of one another (using nltk or otherwise) in Python-CodePudding

I have a list file contents that contains lists consisting of word tokens as its elements. I would like to create a function that takes two strings of length one and size as its input and returns instances where these two terms appear within size words of each other.

I have so far tokenized the words for each element in the list using nltk but I'm not sure where to go from here, could anyone refer me to an nltk method/python code that could do this?

Here is the what the function should look like

file_contents = [['man', 'once', 'upon', 'time', 'love', 'princess'], ['python', 'code', 'cool', 'uses, 'java'],['man', 'help', 'test', 'weird', 'love'], .............]


def check_words_within(string: word1, string:word2, int: size) -> list:
                    #how to implement?

check_words_within('man','love', 4) would return [[man', 'once', 'upon', 'time', 'love'],['man', 'help', 'test', 'weird', 'love']]

check_words_within('man','upon', 1) would return [['man', 'once', 'upon']]

check_words_within('man','document',4) would return []

Does nltk have a function to help me do this?

CodePudding user response：

Create a list of dicts to look up values from.

dat = [{ind: val for val, ind in enumerate(el)} for el in file_contents]

def foo(w1, w2, dist, f, fdat):
    arr = []
    for i, v in enumerate(fdat):
        i1 = v.get(w1)
        i2 = v.get(w2)
        if (i1 is not None) and (i2 is not None) and (i2 - i1 <= dist   1):
            arr.append(f[i][i1:i2 1])
    return arr

foo("man", "upon", 1, file_contents, dat)
# [['man', 'once', 'upon']]