I am trying to create nested dictionaries as I loop through tokens output by my NER model. This is the code that I have so far:
token_classifier = pipeline('ner', model='./fine_tune_nerbert_output/', tokenizer = './fine_tune_nerbert_output/', aggregation_strategy="average")
sentence = "alisa brown i live in san diego, california and sometimes in kansas city, missouri"
tokens = token_classifier(sentence)
which outputs:
[{'entity_group': 'LABEL_1',
'score': 0.99938214,
'word': 'alisa',
'start': 0,
'end': 5},
{'entity_group': 'LABEL_2',
'score': 0.9972813,
'word': 'brown',
'start': 6,
'end': 11},
{'entity_group': 'LABEL_0',
'score': 0.99798816,
'word': 'i live in',
'start': 12,
'end': 21},
{'entity_group': 'LABEL_3',
'score': 0.9993938,
'word': 'san',
'start': 22,
'end': 25},
{'entity_group': 'LABEL_4',
'score': 0.9988097,
'word': 'diego',
'start': 26,
'end': 31},
{'entity_group': 'LABEL_0',
'score': 0.9996742,
'word': ',',
'start': 31,
'end': 32},
{'entity_group': 'LABEL_3',
'score': 0.9985813,
'word': 'california',
'start': 33,
'end': 43},
{'entity_group': 'LABEL_0',
'score': 0.9997311,
'word': 'and sometimes in',
'start': 44,
'end': 60},
{'entity_group': 'LABEL_3',
'score': 0.9995384,
'word': 'kansas',
'start': 61,
'end': 67},
{'entity_group': 'LABEL_4',
'score': 0.9988242,
'word': 'city',
'start': 68,
'end': 72},
{'entity_group': 'LABEL_0',
'score': 0.99949193,
'word': ',',
'start': 72,
'end': 73},
{'entity_group': 'LABEL_3',
'score': 0.99960154,
'word': 'missouri',
'start': 74,
'end': 82}]
I then run a for loop:
ner_dict = dict()
nested_dict = dict()
for token in tokens:
if token['entity_group'] != 'LABEL_0':
if token['entity_group'] in ner_dict:
nested_dict[token['entity_group']] = {}
nested_dict[token['entity_group']][token['word']] = token['score']
ner_dict.update({token['entity_group']: (ner_dict[token['entity_group']], nested_dict[token['entity_group']])})
else:
ner_dict[token['entity_group']] = {}
ner_dict[token['entity_group']][token['word']] = token['score']
this outputs:
{'LABEL_1': {'devyn': 0.9995816},
'LABEL_2': {'donahue': 0.9996502},
'LABEL_3': ((({'san': 0.9994766}, {'california': 0.998961}),
{'san': 0.99925905}),
{'california': 0.9987863}),
'LABEL_4': ({'francisco': 0.99923646}, {'diego': 0.9992399})}
which is close to what I want but this is my ideal output:
{'LABEL_1': {'devyn': 0.9995816},
'LABEL_2': {'donahue': 0.9996502},
'LABEL_3': ({'san': 0.9994766}, {'california': 0.998961}, {'san': 0.99925905},
{'california': 0.9987863}),
'LABEL_4': ({'francisco': 0.99923646}, {'diego': 0.9992399})}
how would I do this without getting each entry in a different tuple? Thanks in advance.
CodePudding user response:
Your output for LABEL_4 should be diego and city based on the input provided. Something like below :
{
'LABEL_1': {'alisa': 0.99938214},
'LABEL_2': {'brown': 0.9972813},
'LABEL_3': {'san': 0.9993938, 'california': 0.9985813, 'kansas': 0.9995384},
'LABEL_4': {'diego': 0.9988097, 'city': 0.9988242}
}
If the above output is what you desire, change the code to
ner_dict = dict()
for token in tokens:
if token['entity_group'] != 'LABEL_0':
nested_dict = ner_dict.setdefault(token['entity_group'], {})
nested_dict[token['word']] = token['score']
CodePudding user response:
Here example that you can use with your code
ner_dict = {}
for token in tokens:
if token['entity_group'] != 'LABEL_0':
ner_dict.setdefault(token['entity_group'], {})[token['word']] = token['score']