I've one list of dictionary, i want to fetch the max floating point number from 'confidence'
where keys ('key'
) are same.
ab = [{'key': 'gdpr.gdpr_compliance.1', 'value': 'Yes', 'idref': '69dbdba4-14d4-4ac8-a318-0d658e4d5b07', 'xpath': '/html/body/p[24]', 'confidence': 0.985},
{'key': 'gdpr.gdpr_compliance.2', 'value': 'Yes', 'idref': '69e2589a-bbf2-49c3-96fc-01fbee5dde03', 'xpath': '/html/body/p[27]', 'confidence': 0.989},
{'key': 'data_collected.personally_identifiable_information.1', 'value': 'Yes', 'idref': 'f6819b54-07a7-4839-b0cc-8343eed28342', 'xpath': '/html/body/ul[6]/li[1]', 'confidence': 0.562},
{'key': 'data_collected.personally_identifiable_information.2', 'value': 'Yes', 'idref': '496400e5-9665-4697-96bc-c55176cdbd02', 'xpath': '/html/body/ul[6]/li[2]', 'confidence': 0.661}]
Here you can observe 1st two dictionary having gdpr but 3rd dictionary having data_collected.
Here i don't understand how we can get the max value
i tried to do in this way
lis = []
for i in ab:
spl = i['key'].split('.')[0]
i['key'] = spl
if i['key'] == spl:
lis.append(i['confidence'])
print(lis)
expected output should be: [0.989, 0.661]
CodePudding user response:
I'm not sure why you want to get a list when your data is key-based. I'd use a dict myself, but then again, maybe you only want to compare neighbouring values, which you can do with itertools.groupby
. I'll include both methods below.
dict
maxes = {}
for d in ab:
confidence = d['confidence']
spl = d['key'].split('.')[0]
if spl not in maxes or confidence > maxes[spl]:
maxes[spl] = confidence
print(maxes)
{'gdpr': 0.989, 'data_collected': 0.661}
groupby
from itertools import groupby
grouper = groupby(ab, lambda d: d['key'].split('.')[0])
maxes = [(k, max(d['confidence'] for d in g)) for k, g in grouper]
print(maxes)
[('gdpr', 0.989), ('data_collected', 0.661)]
Here I'm keeping the keys, but you could very well discard them.
lis = [max(d['confidence'] for d in g) for _k, g in grouper]
print(lis)
[0.989, 0.661]
CodePudding user response:
Where you went wrong
- You split
i['key']
then you assigned the same value back. It doesn't make sense. - Second you assigned
i['key']
tospl
then you immediately checked if they are equal. Obviously they will be.
A right approach
Dictionary
highest_value_dict = {}
for i in ab:
spl = i['key'].split('.')[0]
# if no such key, then add it.
# else check if this key is greater than the one in highest_value_dict
if spl not in highest_value_dict or highest_value_dict[spl] < i['confidence']:
highest_value_dict[spl] = i['confidence']
Output :
{'gdpr': 0.989, 'data_collected': 0.661}
If you really want the values as list
list(highest_value_dict.values())
Output :
[0.989, 0.661]
CodePudding user response:
Something like the below. The idea is to use defaultdict
that will map the key to the max confidence
from collections import defaultdict
ab = [{'key': 'gdpr.gdpr_compliance.1', 'value': 'Yes', 'idref': '69dbdba4-14d4-4ac8-a318-0d658e4d5b07',
'xpath': '/html/body/p[24]', 'confidence': 0.985},
{'key': 'gdpr.gdpr_compliance.2', 'value': 'Yes', 'idref': '69e2589a-bbf2-49c3-96fc-01fbee5dde03',
'xpath': '/html/body/p[27]', 'confidence': 0.989},
{'key': 'data_collected.personally_identifiable_information.1', 'value': 'Yes',
'idref': 'f6819b54-07a7-4839-b0cc-8343eed28342', 'xpath': '/html/body/ul[6]/li[1]', 'confidence': 0.562},
{'key': 'data_collected.personally_identifiable_information.2', 'value': 'Yes',
'idref': '496400e5-9665-4697-96bc-c55176cdbd02', 'xpath': '/html/body/ul[6]/li[2]', 'confidence': 0.661}]
data = defaultdict(float)
for entry in ab:
value = entry['confidence']
key = entry['key'].split('.')[0]
if data[key] < value :
data[key] = value
for k,v in data.items():
print(f'{k} -> {v}')
output
gdpr -> 0.989
data_collected -> 0.661
CodePudding user response:
I suggest solution with O(n) time and memory complexity:
from typing import List
def get_maximal_values(data: dict) -> List[float]:
# Create iterator for extracting needed data
preparing_data = ((x['key'].split('.')[0], x['confidence']) for x in data)
# Find maximum for each unique key
result = {}
for key, confidence in preparing_data:
result[key] = max(result.get(key, 0), confidence)
# return only confidence values
return list(result.values())