i have a list like :
[{'name': 'NOUN-NOUN', 'start_char': 91, 'end_char': 105, 'lemma': 'digital groupe', 'text': 'digital groupe'} ,
{'name': 'NOUN', 'start_char': 91, 'end_char': 98, 'lemma': 'digital', 'text': 'digital'}
{'name': 'NOUN', 'start_char': 99, 'end_char': 105, 'lemma': 'groupe', 'text': 'groupe'}
{'name': 'NOUN-PROPN', 'start_char': 99, 'end_char': 113, 'lemma': 'groupe siparex', 'text': 'groupe siparex'}
{'name': 'NOUN-NOUN-PROPN', 'start_char': 91, 'end_char': 113, 'lemma': 'digital groupe siparex', 'text': 'digital groupe siparex'}
{'name': 'PROPN-PROPN', 'start_char': 0, 'end_char': 12, 'lemma': 'Jean François', 'text': 'Jean François'}
]
i want to clean this list and get only the longest string based on start and end char (so remove the others from the list) : so i want to have in output :
[{'name': 'NOUN-NOUN-PROPN', 'start_char': 91, 'end_char': 113, 'lemma': 'digital groupe siparex', 'text': 'digital groupe siparex'},
{'name': 'PROPN-PROPN', 'start_char': 0, 'end_char': 12, 'lemma': 'Jean François', 'text': 'Jean François'}]
thank you
CodePudding user response:
Try this
lst = [{'name': 'NOUN-NOUN', 'start_char': 91, 'end_char': 105, 'lemma': 'digital groupe', 'text': 'digital groupe'} ,
{'name': 'NOUN', 'start_char': 91, 'end_char': 98, 'lemma': 'digital', 'text': 'digital'},
{'name': 'NOUN', 'start_char': 99, 'end_char': 105, 'lemma': 'groupe', 'text': 'groupe'},
{'name': 'NOUN-PROPN', 'start_char': 99, 'end_char': 113, 'lemma': 'groupe siparex', 'text': 'groupe siparex'},
{'name': 'NOUN-NOUN-PROPN', 'start_char': 91, 'end_char': 113, 'lemma': 'digital groupe siparex', 'text': 'digital groupe siparex'}
]
print(max(lst, key=lambda e:e['end_char'] - e['start_char']))
CodePudding user response:
If you want multiple words, maybe something like:
lengths = [d['end_char'] - d['start_char'] for d in data ]
print([obj for obj in data if obj["end_char"] - obj["start_char"] == max(lengths)] )
But this is arguably not as concise as the first answer.