Home > Enterprise >  Calculate sum of columns from nested Python dictionary
Calculate sum of columns from nested Python dictionary

Time:09-08

I have a dictionary in the following format:

{
    '1000': {
        'owner_id':'1000',
        'in_hours':100,
        'us_hours':200,
        'total_revenue':100
    },
    '1084': {
        'owner_id':'10084',
        'in_hours':100,
        'us_hours':200,
        'total_revenue':100
    }
}

Now I need the sum like the following:

{
    'in_hours':200,
    'us_hours':400,
    'total_revenue':200
}

How to do this? I know I can do this in a loop, but I have a number of keys and I don't want to specify all of them individually.

CodePudding user response:

The way to do this depends on how you determine which values are relevant. Let's assume that you want to sum all values that are integers. In this way we don't need to know what the keys are but just their values.

data = {
    '1000': {
        'owner_id':'1000',
        'in_hours':100,
        'us_hours':200,
        'total_revenue':100
    },
    '1084': {
        'owner_id':'10084',
        'in_hours':100,
        'us_hours':200,
        'total_revenue':100
    }
}

result = {}

for v in data.values():
    assert isinstance(v, dict)
    for k, v in v.items():
        if isinstance(v, int):
            result[k] = result.get(k, 0)   v

print(result)

Output:

{'in_hours': 200, 'us_hours': 400, 'total_revenue': 200}

CodePudding user response:

if you know that your keys start with key prefix you can do:

from collections import defaultdict

result = defaultdict(int)
for v in data.values():
    for key, value in v.items():
        if key.startswith('key'):
            result[key]  = value

print(dict(result))

Output:

{
    'key1': 200, 
    'key2': 400, 
    'key3': 200
}

CodePudding user response:

You can use the collection interface

collections.Counter()    
for d in data.values():
        counter.update(d)
result = dict(counter)
    
output 
    {'key3': 200, 'key2': 400, 'key1': 200, 'owner_id': '100841000'}

CodePudding user response:

Use a collections.defaultdict and sum up all values of the inner dictionaries which are numbers (int or float) (so you do not need to know the name). I assume you do not want to concatenate the string IDs (which seem to be redundant, anyway, since they are already the keys to the outer dictionary?):

from collections import defaultdict

in_dict = {
    '1000': {
        'owner_id': '1000',
        'key1': 100,
        'key2': 200,
        'key3': 100,
        'time': 0.7
    },
    '1084': {
        'owner_id': '1084',
        'key1': 100,
        'key2': 200,
        'key3': 100,
        'time': 4.5
    }
}

out_dict = defaultdict(int)
for outer in in_dict.values():
    for key, value in outer.items():
        if isinstance(value, (int, float)):
            out_dict[key]  = value

print(out_dict)
# defaultdict(<class 'int'>, {'key1': 200, 'key2': 400, 'key3': 200, 'time': 5.2})

A version with try and except is also possible, so we do not specialize for int and float:

out_dict = defaultdict(int)
for outer in in_dict.values():
    for key, value in outer.items():
        try:
            out_dict[key]  = value
        except TypeError:
            # need to remove key due to side effects of defaultdict
            del out_dict[key]

print(out_dict)
# defaultdict(<class 'int'>, {'key1': 200, 'key2': 400, 'key3': 200, 'time': 5.2})
  • Related