Lets say I have a string input like this:
text = "Sir John Doe is 45 and lives in London Sir Jack Doe is 42 and lives in Dublin Miss Jane Doe is 29 and lives in Berlin"
And I want to create an object of each person with their information. First I use this function to split the text in a substring for each person:
def split(text):
split_text = re.split('Sir|Miss', text)
return split_text
so the split function returns:
[John Doe is 45 and lives in London]
[Jack Doe is 42 and lives in Dublin]
[Jane Doe is 29 and lives in Berlin]
This works fine.
However now I want the function below (personal_data) to go through all the results, no matter how many people there are in the list, and perform the Q/A and return a person_info for each person.
I use a Q/A AI model to extract the exact information, as arguments it takes the question (what is the age? what is the city? etc) and the string (here: subtext):
def personal_data(split_text, nlp):
person_info = {"Age": None, "City": None}
for subtext in split_text:
question = "How old is this person?"
response = nlp({"question": question, "context": subtext})
person_info["Age"] = response["answer"]
question = "Where does this person live?"
response = nlp({"question": question, "context": subtext})
person_info["City"] = response["answer"]
So the output should be:
[{"Age":45, "City":"London},{"Age":42, "City":"Dublin"},{"Age":29, "City":"Berlin"}
However, I cannot find the right way for the personal_data function to work, most of the times it tells me subtext isn't a string. What is the proper way to achieve this?
CodePudding user response:
You need to create a new person_info
dictionary each time through the loop, then append them to a list.
def personal_data(split_text, nlp):
result = []
for subtext in split_text:
person_info = {}
question = "How old is this person?"
response = nlp({"question": question, "context": subtext})
person_info["Age"] = response["answer"]
question = "Where does this person live?"
response = nlp({"question": question, "context": subtext})
person_info["City"] = response["answer"]
result.append(person_info)
return result