How to get the max value from a dictionary based on conditions-CodePudding

I've one list of dictionary, i want to fetch the max floating point number from 'confidence' where keys ('key') are same.

ab = [{'key': 'gdpr.gdpr_compliance.1', 'value': 'Yes', 'idref': '69dbdba4-14d4-4ac8-a318-0d658e4d5b07', 'xpath': '/html/body/p[24]', 'confidence': 0.985},
      {'key': 'gdpr.gdpr_compliance.2', 'value': 'Yes', 'idref': '69e2589a-bbf2-49c3-96fc-01fbee5dde03', 'xpath': '/html/body/p[27]', 'confidence': 0.989},
      {'key': 'data_collected.personally_identifiable_information.1', 'value': 'Yes', 'idref': 'f6819b54-07a7-4839-b0cc-8343eed28342', 'xpath': '/html/body/ul[6]/li[1]', 'confidence': 0.562},
      {'key': 'data_collected.personally_identifiable_information.2', 'value': 'Yes', 'idref': '496400e5-9665-4697-96bc-c55176cdbd02', 'xpath': '/html/body/ul[6]/li[2]', 'confidence': 0.661}]

Here you can observe 1st two dictionary having gdpr but 3rd dictionary having data_collected.

Here i don't understand how we can get the max value

i tried to do in this way

lis = []
for i in ab:
    spl = i['key'].split('.')[0]
    i['key'] = spl
    if i['key'] == spl:
        lis.append(i['confidence'])
print(lis)

expected output should be: [0.989, 0.661]

CodePudding user response：

I'm not sure why you want to get a list when your data is key-based. I'd use a dict myself, but then again, maybe you only want to compare neighbouring values, which you can do with itertools.groupby. I'll include both methods below.

dict

maxes = {}
for d in ab:
    confidence = d['confidence']
    spl = d['key'].split('.')[0]
    if spl not in maxes or confidence > maxes[spl]:
        maxes[spl] = confidence
print(maxes)

{'gdpr': 0.989, 'data_collected': 0.661}

groupby

from itertools import groupby

grouper = groupby(ab, lambda d: d['key'].split('.')[0])
maxes = [(k, max(d['confidence'] for d in g)) for k, g in grouper]
print(maxes)

[('gdpr', 0.989), ('data_collected', 0.661)]

Here I'm keeping the keys, but you could very well discard them.

lis = [max(d['confidence'] for d in g) for _k, g in grouper]
print(lis)

[0.989, 0.661]

CodePudding user response：

Where you went wrong

You split i['key'] then you assigned the same value back. It doesn't make sense.
Second you assigned i['key'] to spl then you immediately checked if they are equal. Obviously they will be.

A right approach

Dictionary

highest_value_dict = {}
for i in ab:
    spl = i['key'].split('.')[0]
    # if no such key, then add it.
    # else check if this key is greater than the one in highest_value_dict
    if spl not in highest_value_dict or highest_value_dict[spl] < i['confidence']:
        highest_value_dict[spl] = i['confidence']

Output :

{'gdpr': 0.989, 'data_collected': 0.661}

If you really want the values as list

list(highest_value_dict.values())

Output :

[0.989, 0.661]

CodePudding user response：

Something like the below. The idea is to use defaultdict that will map the key to the max confidence

from collections import defaultdict
ab = [{'key': 'gdpr.gdpr_compliance.1', 'value': 'Yes', 'idref': '69dbdba4-14d4-4ac8-a318-0d658e4d5b07',
       'xpath': '/html/body/p[24]', 'confidence': 0.985},
      {'key': 'gdpr.gdpr_compliance.2', 'value': 'Yes', 'idref': '69e2589a-bbf2-49c3-96fc-01fbee5dde03',
       'xpath': '/html/body/p[27]', 'confidence': 0.989},
      {'key': 'data_collected.personally_identifiable_information.1', 'value': 'Yes',
       'idref': 'f6819b54-07a7-4839-b0cc-8343eed28342', 'xpath': '/html/body/ul[6]/li[1]', 'confidence': 0.562},
      {'key': 'data_collected.personally_identifiable_information.2', 'value': 'Yes',
       'idref': '496400e5-9665-4697-96bc-c55176cdbd02', 'xpath': '/html/body/ul[6]/li[2]', 'confidence': 0.661}]

data = defaultdict(float)
for entry in ab:
    value = entry['confidence']
    key = entry['key'].split('.')[0]
    if data[key] < value :
        data[key] = value

for k,v in data.items():
    print(f'{k} -> {v}')

output

gdpr -> 0.989
data_collected -> 0.661

CodePudding user response：

I suggest solution with O(n) time and memory complexity:

from typing import List


def get_maximal_values(data: dict) -> List[float]:
    # Create iterator for extracting needed data
    preparing_data = ((x['key'].split('.')[0], x['confidence']) for x in data)
    
    # Find maximum for each unique key
    result = {}
    for key, confidence in preparing_data:
        result[key] = max(result.get(key, 0), confidence)
    # return only confidence values
    return list(result.values())