Home > Mobile >  find set of words in list
find set of words in list

Time:10-26

In the example I made up below, I am trying to get the words STEM Employment from the text list. If I find those set of words in order, I would like to find the index number of the first word so I can then use that same index number for the width and height lists since they are parallel (meaning their len is always the same dynamic value)

teststring = ("STEM Employment").split(" ")

data = {"text":["some","more","STEM","Employment","data"],
        "width":[100,45,50,90,354],
        "height":[500,320,320,432,554]}

so for this example, the answer would be 50 and 320 because the first word is STEM. However I am not just looking for STEM I have to make sure that Employment follows right after STEM in the list.

I tried writing a forloop for this but my forloop stops short when it confirms the first word STEM. I am not sure how to fix it:

testchecker = 0
for testword in range(len(data)):
    print(data["text"][testword])
    for m in teststring:
        # print(m)
        print(testchecker)
        if m in data["text"][testword]:
            print("true")
            testchecker = testchecker   1
            if testchecker == len(teststring):
                print("match")
                print(testword-testchecker 1)
            pass
        else:
            testchecker = 0

CodePudding user response:

You can make data["text"] a string with join and check for "STEM Employment" in that. Then find the index of "STEM".

teststring = "STEM Employment"
data = {"text":["some","more","STEM","Employment","data"],
        "width":[100,45,50,90,354],
        "height":[500,320,320,432,554]}
if teststring in " ".join(data["text"]):
    idx = data["text"].index(teststring.split(' ')[0])
    print(data["width"][idx], data["height"][idx])

Output:

50 320

Another option:

teststring = "STEM Employment".split(' ')
# Make sure all words in testring are in data["text"]
if all(s in data["text"] for s in teststring):
    # Get the indexes of each word
    indexes = [data["text"].index(s) for s in teststring]
    # Make sure all indexes are sequential
    if all(b - a == 1 for a, b in zip(indexes, indexes[1:])):
        print(data["width"][indexes[0]], data["height"][indexes[0]])
  • Related