How to convert a json output to a list of lists?-CodePudding

I have data in a json file which contains a text and location indices as string values. My requirement is to consolidate all these values and make a list of lists such that the indices are converted to proper numbers. Below is one sample from the dataset.

{
      "feature_text": "No-relief-with-asthma-inhaler",
      "location": "['334 366']"
    },
    {
      "feature_text": "Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale",
      "location": "['25 30', '149 153']"
    }

[[334, 366, 'No-relief-with-asthma-inhaler'],[25,30,'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'],[149,153,'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale']]

To handle these scenarios I have written a python code (which may be not the best approach) but it is not working properly due to the 'list of lists' problem. Any update I make in the working list impacts my master list of lists. I can understand the problem but unable to formulate a solution to it. Below is my code, please advise how can I fix this code.

# Using json data, get the part of speech and BILUO tags. Save the details to a dataframe
# Import and load the spacy model
nlp=spacy.load("en_core_web_sm") 
# Getting the ner component
ner=nlp.get_pipe('ner')

all_entities=[]
df_prep = []
entity=[]

data = json.loads(j)
for text in data: 
    doc=nlp(text['pn_history'])
    xx=([token.text for token in doc])
    yy=([token.pos_ for token in doc]) 
    for x in (text['entities']):
      a,b,=(x.values())
      a=a.replace("[", " ")
      a=a.replace("]", " ")
      a=a.replace("'", " ")
      a=a.strip()
      if len(a) > 0:
        entity.clear()
        if ',' in a:
          for y in a.split(','):
            y=y.strip()
            for m in y.split(' '):
              b_=int(m)
              entity.append(b_)
            entity.append(b)
            all_entities.append(entity)
            print(all_entities)
        else:
          entity.clear()
          a=a.strip()
          for d in a.split(' '):
            f_=int(d)
            entity.append(f_)
          entity.append(b)
          all_entities.append(entity)
          print(all_entities)
    break

Below is output of print statements in the code. I can see that from the for loop every element is getting added to the 'all_entities' list but the older values of this list are also getting updated when I append new values to 'entity' list.

[[188, 197, 'Subjective-fevers']]
[[7, 11, 'Male'], [7, 11, 'Male']]
[[2, 6, '17-year'], [2, 6, '17-year'], [2, 6, '17-year']]
[[66, 86, 'Recent-upper-respiratory-symptoms'], [66, 86, 'Recent-upper-respiratory-symptoms'], [66, 86, 'Recent-upper-respiratory-symptoms'], [66, 86, 'Recent-upper-respiratory-symptoms']]
[[31, 59, 'Worse-with-deep-breath-OR-pleuritic'], [31, 59, 'Worse-with-deep-breath-OR-pleuritic'], [31, 59, 'Worse-with-deep-breath-OR-pleuritic'], [31, 59, 'Worse-with-deep-breath-OR-pleuritic'], [31, 59, 'Worse-with-deep-breath-OR-pleuritic']]
[[31, 59, 'Worse-with-deep-breath-OR-pleuritic', 149, 171, 'Worse-with-deep-breath-OR-pleuritic'], [31, 59, 'Worse-with-deep-breath-OR-pleuritic', 149, 171, 'Worse-with-deep-breath-OR-pleuritic'], [31, 59, 'Worse-with-deep-breath-OR-pleuritic', 149, 171, 'Worse-with-deep-breath-OR-pleuritic'], [31, 59, 'Worse-with-deep-breath-OR-pleuritic', 149, 171, 'Worse-with-deep-breath-OR-pleuritic'], [31, 59, 'Worse-with-deep-breath-OR-pleuritic', 149, 171, 'Worse-with-deep-breath-OR-pleuritic'], [31, 59, 'Worse-with-deep-breath-OR-pleuritic', 149, 171, 'Worse-with-deep-breath-OR-pleuritic']]
[[379, 402, 'Exercise-induced-asthma'], [379, 402, 'Exercise-induced-asthma'], [379, 402, 'Exercise-induced-asthma'], [379, 402, 'Exercise-induced-asthma'], [379, 402, 'Exercise-induced-asthma'], [379, 402, 'Exercise-induced-asthma'], [379, 402, 'Exercise-induced-asthma']]
[[31, 41, 'Chest-pain'], [31, 41, 'Chest-pain'], [31, 41, 'Chest-pain'], [31, 41, 'Chest-pain'], [31, 41, 'Chest-pain'], [31, 41, 'Chest-pain'], [31, 41, 'Chest-pain'], [31, 41, 'Chest-pain']]
[[630, 641, 'Recent-heavy-lifting-at-work-OR-recent-rock-climbing'], [630, 641, 'Recent-heavy-lifting-at-work-OR-recent-rock-climbing'], [630, 641, 'Recent-heavy-lifting-at-work-OR-recent-rock-climbing'], [630, 641, 'Recent-heavy-lifting-at-work-OR-recent-rock-climbing'], [630, 641, 'Recent-heavy-lifting-at-work-OR-recent-rock-climbing'], [630, 641, 'Recent-heavy-lifting-at-work-OR-recent-rock-climbing'], [630, 641, 'Recent-heavy-lifting-at-work-OR-recent-rock-climbing'], [630, 641, 'Recent-heavy-lifting-at-work-OR-recent-rock-climbing'], [630, 641, 'Recent-heavy-lifting-at-work-OR-recent-rock-climbing']]
[[334, 366, 'No-relief-with-asthma-inhaler'], [334, 366, 'No-relief-with-asthma-inhaler'], [334, 366, 'No-relief-with-asthma-inhaler'], [334, 366, 'No-relief-with-asthma-inhaler'], [334, 366, 'No-relief-with-asthma-inhaler'], [334, 366, 'No-relief-with-asthma-inhaler'], [334, 366, 'No-relief-with-asthma-inhaler'], [334, 366, 'No-relief-with-asthma-inhaler'], [334, 366, 'No-relief-with-asthma-inhaler'], [334, 366, 'No-relief-with-asthma-inhaler']]
[[25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale']]
[[25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale'], [25, 30, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale', 149, 153, 'Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale']]

CodePudding user response：

You can use ast.literal_eval to parse the location:

from ast import literal_eval

data = [
    {
        "feature_text": "No-relief-with-asthma-inhaler",
        "location": "['334 366']",
    },
    {
        "feature_text": "Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale",
        "location": "['25 30', '149 153']",
    },
]

out = []
for d in data:
    location = literal_eval(d["location"])
    for numbers in location:
        out.append([*map(int, numbers.split()), d["feature_text"]])

print(out)

Prints:

[
    [334, 366, "No-relief-with-asthma-inhaler"],
    [25, 30, "Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale"],
    [149, 153, "Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale"],
]