find phrase in list and get index-CodePudding

I have tried researching this but can't find anything like what I am looking for.

I am trying to find a specific phrase in a list. Here is a test list:

data = {"text":["Map:","Internet",
"Subscriptions","","Map:",
"Adult","Literacy","and",
"Numeracy","|","8"]}

I want to get the index position of the first word in a phrase I am looking for like: Map: Adult Literacy and Numeracy. The answer for this would be 4 since the first word of the phrase is Map:. However, there are 2 Maps: in the list and I only need to find the one that is apart of the whole phrase Map: Adult Literacy and Numeracy.

Here is what I tried:

teststring = "Map: Adult Literacy and Numeracy"
teststring_split = teststring.split(" ")

data = {"text":["Map:","Internet",
"Subscriptions","","Map:",
"Adult","Literacy","and",
"Numeracy","|","8"]}

if teststring in " ".join(data["text"]):
    idx = data["text"].index(teststring.split(' ')[0])
    print(idx)

However it comes out it with 0 which makes sense because I am not sure how to get the specific Maps: that is apart of the phrase.

EDIT I am close because of @Alexander 's answer. I would have accepted his answer as correct but his answer only checks the first two index values in the phrase's split string. I would need to check the value as the list and phrases are dynamic and some phrases are very similar in wording.

Here is the code I have so far now:

for i in range(len(data['text'])):
    if data['text'][i] == teststring_split[0]:
        for m in range(len(teststring_split)):
            if data['text'][i   m] == teststring_split[m]:
                print(teststring_split[m])

This will output:

Map:
Map:
Adult
Literacy
and
Numeracy

So I can get a confirmation on the phrase as it prints out but I am not sure how to get the index of 4 after confirming the last word Numeracy

CodePudding user response：

List comprehension will work. Just iterate through the data searching for values where the index == Map: and the following index is the second term of the teststring.

teststring = "Map: Adult Literacy and Numeracy"
teststring_split = teststring.split(" ")

data = {"text":["Map:","Internet",
"Subscriptions","","Map:",
"Adult","Literacy","and",
"Numeracy","|","8"]}

idxs = [i for i in range(len(data['text']))
        if data['text'][i] == teststring_split[0]
        and data['text'][i:i len(teststring_split)] == teststring_split]
print(idxs)

Output:

[4]

CodePudding user response：

You should probably start by, instead of converting teststring to a list, user "".join(data) to make data one string. This makes it much easier to scan through.

Then, use regular expressions to search for your phrase:

import re

teststring = "Map: Adult Literacy and Numeracy"
data = {"text":["Map:","Internet",
"Subscriptions","","Map:",
"Adult","Literacy","and",
"Numeracy","|","8"]}

data = "".join(data)
match = re.search(teststring, data)
print(match)

CodePudding user response：

I came up with an answer at the same time @alexander fixed his answer. His is better as its less code but here is the version I came up with before I saw his answer:

for i in range(len(data['text'])):
    if data['text'][i] == teststring_split[0]:
        testindexchecker = 0
        for m in range(len(teststring_split)):
            if data['text'][i   m] == teststring_split[m]:
                print(teststring_split[m])
                testindexchecker = testindexchecker 1
            if testindexchecker == len(teststring_split):
                idxs = i
                print(i)