I have data in a json file which contains a text and location indices as string values. My requirement is to consolidate all these values and make a list of lists such that the indices are converted to proper numbers. Below is one sample from the dataset.
{
"feature_text": "No-relief-with-asthma-inhaler",
"location": "['334 366']"
},
{
"feature_text": "Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale",
"location": "['25 30', '149 153']"
}
[[334, 366, 'No-relief-with-asthma-inhaler'],[25,30,'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'],[149,153,'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale']]
To handle these scenarios I have written a python code (which may be not the best approach) but it is not working properly due to the 'list of lists' problem. Any update I make in the working list impacts my master list of lists. I can understand the problem but unable to formulate a solution to it. Below is my code, please advise how can I fix this code.
# Using json data, get the part of speech and BILUO tags. Save the details to a dataframe
# Import and load the spacy model
nlp=spacy.load("en_core_web_sm")
# Getting the ner component
ner=nlp.get_pipe('ner')
all_entities=[]
df_prep = []
entity=[]
data = json.loads(j)
for text in data:
doc=nlp(text['pn_history'])
xx=([token.text for token in doc])
yy=([token.pos_ for token in doc])
for x in (text['entities']):
a,b,=(x.values())
a=a.replace("[", " ")
a=a.replace("]", " ")
a=a.replace("'", " ")
a=a.strip()
if len(a) > 0:
entity.clear()
if ',' in a:
for y in a.split(','):
y=y.strip()
for m in y.split(' '):
b_=int(m)
entity.append(b_)
entity.append(b)
all_entities.append(entity)
print(all_entities)
else:
entity.clear()
a=a.strip()
for d in a.split(' '):
f_=int(d)
entity.append(f_)
entity.append(b)
all_entities.append(entity)
print(all_entities)
break
Below is output of print statements in the code. I can see that from the for loop every element is getting added to the 'all_entities' list but the older values of this list are also getting updated when I append new values to 'entity' list.
[[188, 197, 'Subjective-fevers']]
[[7, 11, 'Male'], [7, 11, 'Male']]
[[2, 6, '17-year'], [2, 6, '17-year'], [2, 6, '17-year']]
[[66, 86, 'Recent-upper-respiratory-symptoms'], [66, 86, 'Recent-upper-respiratory-symptoms'], [66, 86, 'Recent-upper-respiratory-symptoms'], [66, 86, 'Recent-upper-respiratory-symptoms']]
[[31, 59, 'Worse-with-deep-breath-OR-pleuritic'], [31, 59, 'Worse-with-deep-breath-OR-pleuritic'], [31, 59, 'Worse-with-deep-breath-OR-pleuritic'], [31, 59, 'Worse-with-deep-breath-OR-pleuritic'], [31, 59, 'Worse-with-deep-breath-OR-pleuritic']]
[[31, 59, 'Worse-with-deep-breath-OR-pleuritic', 149, 171, 'Worse-with-deep-breath-OR-pleuritic'], [31, 59, 'Worse-with-deep-breath-OR-pleuritic', 149, 171, 'Worse-with-deep-breath-OR-pleuritic'], [31, 59, 'Worse-with-deep-breath-OR-pleuritic', 149, 171, 'Worse-with-deep-breath-OR-pleuritic'], [31, 59, 'Worse-with-deep-breath-OR-pleuritic', 149, 171, 'Worse-with-deep-breath-OR-pleuritic'], [31, 59, 'Worse-with-deep-breath-OR-pleuritic', 149, 171, 'Worse-with-deep-breath-OR-pleuritic'], [31, 59, 'Worse-with-deep-breath-OR-pleuritic', 149, 171, 'Worse-with-deep-breath-OR-pleuritic']]
[[379, 402, 'Exercise-induced-asthma'], [379, 402, 'Exercise-induced-asthma'], [379, 402, 'Exercise-induced-asthma'], [379, 402, 'Exercise-induced-asthma'], [379, 402, 'Exercise-induced-asthma'], [379, 402, 'Exercise-induced-asthma'], [379, 402, 'Exercise-induced-asthma']]
[[31, 41, 'Chest-pain'], [31, 41, 'Chest-pain'], [31, 41, 'Chest-pain'], [31, 41, 'Chest-pain'], [31, 41, 'Chest-pain'], [31, 41, 'Chest-pain'], [31, 41, 'Chest-pain'], [31, 41, 'Chest-pain']]
[[630, 641, 'Recent-heavy-lifting-at-work-OR-recent-rock-climbing'], [630, 641, 'Recent-heavy-lifting-at-work-OR-recent-rock-climbing'], [630, 641, 'Recent-heavy-lifting-at-work-OR-recent-rock-climbing'], [630, 641, 'Recent-heavy-lifting-at-work-OR-recent-rock-climbing'], [630, 641, 'Recent-heavy-lifting-at-work-OR-recent-rock-climbing'], [630, 641, 'Recent-heavy-lifting-at-work-OR-recent-rock-climbing'], [630, 641, 'Recent-heavy-lifting-at-work-OR-recent-rock-climbing'], [630, 641, 'Recent-heavy-lifting-at-work-OR-recent-rock-climbing'], [630, 641, 'Recent-heavy-lifting-at-work-OR-recent-rock-climbing']]
[[334, 366, 'No-relief-with-asthma-inhaler'], [334, 366, 'No-relief-with-asthma-inhaler'], [334, 366, 'No-relief-with-asthma-inhaler'], [334, 366, 'No-relief-with-asthma-inhaler'], [334, 366, 'No-relief-with-asthma-inhaler'], [334, 366, 'No-relief-with-asthma-inhaler'], [334, 366, 'No-relief-with-asthma-inhaler'], [334, 366, 'No-relief-with-asthma-inhaler'], [334, 366, 'No-relief-with-asthma-inhaler'], [334, 366, 'No-relief-with-asthma-inhaler']]
[[25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale']]
[[25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale']]
CodePudding user response:
You can use ast.literal_eval
to parse the location
:
from ast import literal_eval
data = [
{
"feature_text": "No-relief-with-asthma-inhaler",
"location": "['334 366']",
},
{
"feature_text": "Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale",
"location": "['25 30', '149 153']",
},
]
out = []
for d in data:
location = literal_eval(d["location"])
for numbers in location:
out.append([*map(int, numbers.split()), d["feature_text"]])
print(out)
Prints:
[
[334, 366, "No-relief-with-asthma-inhaler"],
[25, 30, "Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale"],
[149, 153, "Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale"],
]