Append a list of dictionaries to the value in another dictionary-CodePudding

I am trying to create nested dictionaries as I loop through tokens output by my NER model. This is the code that I have so far:

token_classifier = pipeline('ner', model='./fine_tune_nerbert_output/', tokenizer = './fine_tune_nerbert_output/', aggregation_strategy="average")
sentence = "alisa brown i live in san diego, california and sometimes in kansas city, missouri"
tokens = token_classifier(sentence)

which outputs:

[{'entity_group': 'LABEL_1',
  'score': 0.99938214,
  'word': 'alisa',
  'start': 0,
  'end': 5},
 {'entity_group': 'LABEL_2',
  'score': 0.9972813,
  'word': 'brown',
  'start': 6,
  'end': 11},
 {'entity_group': 'LABEL_0',
  'score': 0.99798816,
  'word': 'i live in',
  'start': 12,
  'end': 21},
 {'entity_group': 'LABEL_3',
  'score': 0.9993938,
  'word': 'san',
  'start': 22,
  'end': 25},
 {'entity_group': 'LABEL_4',
  'score': 0.9988097,
  'word': 'diego',
  'start': 26,
  'end': 31},
 {'entity_group': 'LABEL_0',
  'score': 0.9996742,
  'word': ',',
  'start': 31,
  'end': 32},
 {'entity_group': 'LABEL_3',
  'score': 0.9985813,
  'word': 'california',
  'start': 33,
  'end': 43},
 {'entity_group': 'LABEL_0',
  'score': 0.9997311,
  'word': 'and sometimes in',
  'start': 44,
  'end': 60},
 {'entity_group': 'LABEL_3',
  'score': 0.9995384,
  'word': 'kansas',
  'start': 61,
  'end': 67},
 {'entity_group': 'LABEL_4',
  'score': 0.9988242,
  'word': 'city',
  'start': 68,
  'end': 72},
 {'entity_group': 'LABEL_0',
  'score': 0.99949193,
  'word': ',',
  'start': 72,
  'end': 73},
 {'entity_group': 'LABEL_3',
  'score': 0.99960154,
  'word': 'missouri',
  'start': 74,
  'end': 82}]

I then run a for loop:

ner_dict = dict()
nested_dict = dict()
for token in tokens:
    if token['entity_group'] != 'LABEL_0':
        if token['entity_group'] in ner_dict:
            nested_dict[token['entity_group']] = {}
            nested_dict[token['entity_group']][token['word']] = token['score']
            ner_dict.update({token['entity_group']: (ner_dict[token['entity_group']], nested_dict[token['entity_group']])})
        else:
            ner_dict[token['entity_group']] = {}
            ner_dict[token['entity_group']][token['word']] = token['score']

this outputs:

{'LABEL_1': {'devyn': 0.9995816},
 'LABEL_2': {'donahue': 0.9996502},
 'LABEL_3': ((({'san': 0.9994766}, {'california': 0.998961}),
   {'san': 0.99925905}),
  {'california': 0.9987863}),
 'LABEL_4': ({'francisco': 0.99923646}, {'diego': 0.9992399})}

which is close to what I want but this is my ideal output:

{'LABEL_1': {'devyn': 0.9995816},
     'LABEL_2': {'donahue': 0.9996502},
     'LABEL_3': ({'san': 0.9994766}, {'california': 0.998961}, {'san': 0.99925905},
      {'california': 0.9987863}),
     'LABEL_4': ({'francisco': 0.99923646}, {'diego': 0.9992399})}

how would I do this without getting each entry in a different tuple? Thanks in advance.

CodePudding user response：

Your output for LABEL_4 should be diego and city based on the input provided. Something like below :

{
 'LABEL_1': {'alisa': 0.99938214},
 'LABEL_2': {'brown': 0.9972813},
 'LABEL_3': {'san': 0.9993938, 'california': 0.9985813, 'kansas': 0.9995384},
 'LABEL_4': {'diego': 0.9988097, 'city': 0.9988242}
}

If the above output is what you desire, change the code to

ner_dict = dict()
for token in tokens:
    if token['entity_group'] != 'LABEL_0':
        nested_dict = ner_dict.setdefault(token['entity_group'], {})
        nested_dict[token['word']] = token['score']

CodePudding user response：

Here example that you can use with your code

ner_dict = {}
for token in tokens:
    if token['entity_group'] != 'LABEL_0':
        ner_dict.setdefault(token['entity_group'], {})[token['word']] = token['score']