Home > Net >  Sum values based on same keys in dict and make array
Sum values based on same keys in dict and make array

Time:10-23

Hi guys I have data like this

[
{
    'name': 'snow 7', 
    'count': 1, 
    'rows_processed': None, 
    'pipelines': 1
}, 
{
    'name': 'snow 6',
    'count': 1,
    'rows_processed': None,
    'pipelines': 1
},
{
    'name': 'snow 6',
    'count': 1,
    'rows_processed': None,
    'pipelines': 1
}, 
{
    'name': 'snow 6',
    'count': 2,
    'rows_processed': None,
    'pipelines': 2
},
{
    'name': 'snow 5',
    'count': 2,
    'rows_processed': 4,
    'pipelines': 2
},
{
    'name': 'snow 4',
    'count': 2,
    'rows_processed': None,
    'pipelines': 2
}]

and i want to sum the values of rows_processed and pipelines based on name key like for snow 6 pipelines sum will be 4 and so on, basically the final data should look like this.

    {
     "Rows Processed": [0, 0, 4, 0],
     "Pipelines Processed": [1, 4, 2, 2]
    }

how can i make data like above? this is what i have done so for

    rows_processed = {}
    pipeline_processed = {}
    for batch in batches:
        for label in batch.keys():
            rows_processed[label] = rows_processed.get(batch['rows_processed'], 0)   batch['rows_processed'] if batch['rows_processed'] else 0
    for batch in batches:
        for label in batch.keys():
            pipeline_processed[label] = pipeline_processed.get(batch['pipelines'], 0)   batch['pipelines'] if \
            batch['pipelines'] else 0

CodePudding user response:

One way using a two-level defaultdict and Boolean Operations:

>>> from collections import defaultdict
>>>
>>> d = defaultdict(lambda: defaultdict(int))
>>> for batch in batches:
...     d['Rows Processed'][batch['name']]  = batch['rows_processed'] or 0
...     d['Pipelines Processed'][batch['name']]  = batch['pipelines'] or 0
... 
>>> list(d['Rows Processed'].values())
[0, 0, 4, 0]
>>> list(d['Pipelines Processed'].values())
[1, 4, 2, 2]

CodePudding user response:

Hey guys I resolved the above question by doing the following code however i'm not sure if this is the right approach or not. If anyone has better approach then please let me know.

    rows_processed = {}
    pipeline_processed = {}
    for batch in batches:
        rows_processed[batch['name']] = rows_processed.get(batch['name'], 0)   batch['rows_processed'] if batch['rows_processed'] else 0
    for batch in batches:
        pipeline_processed[batch['name']] = pipeline_processed.get(batch['name'], 0)   batch['pipelines'] if batch['pipelines'] else 0
print(list(rows_processed.values()))
print(list(pipeline_processed.values()))
  • Related