Home > Net >  Python - How to loop through the results of a re.split to extract information from each string?
Python - How to loop through the results of a re.split to extract information from each string?

Time:07-28

Lets say I have a string input like this:

text = "Sir John Doe is 45 and lives in London Sir Jack Doe is 42 and lives in Dublin Miss Jane Doe is 29 and lives in Berlin"

And I want to create an object of each person with their information. First I use this function to split the text in a substring for each person:

def split(text):
    split_text = re.split('Sir|Miss', text)
    return split_text

so the split function returns:

[John Doe is 45 and lives in London]
[Jack Doe is 42 and lives in Dublin]
[Jane Doe is 29 and lives in Berlin]

This works fine.

However now I want the function below (personal_data) to go through all the results, no matter how many people there are in the list, and perform the Q/A and return a person_info for each person.

I use a Q/A AI model to extract the exact information, as arguments it takes the question (what is the age? what is the city? etc) and the string (here: subtext):

def personal_data(split_text, nlp):
    person_info = {"Age": None, "City": None}
    for subtext in split_text:
        question = "How old is this person?"
        response = nlp({"question": question, "context": subtext})
        person_info["Age"] = response["answer"]
        question = "Where does this person live?"
        response = nlp({"question": question, "context": subtext})
        person_info["City"] = response["answer"]

So the output should be:

[{"Age":45, "City":"London},{"Age":42, "City":"Dublin"},{"Age":29, "City":"Berlin"}

However, I cannot find the right way for the personal_data function to work, most of the times it tells me subtext isn't a string. What is the proper way to achieve this?

CodePudding user response:

You need to create a new person_info dictionary each time through the loop, then append them to a list.

def personal_data(split_text, nlp):
    result = []
    for subtext in split_text:
        person_info = {}
        question = "How old is this person?"
        response = nlp({"question": question, "context": subtext})
        person_info["Age"] = response["answer"]
        question = "Where does this person live?"
        response = nlp({"question": question, "context": subtext})
        person_info["City"] = response["answer"]
        result.append(person_info)
    return result
  • Related