I have the below sample dictionary,
errors = [{'PartitionKey': '34', 'RowKey': '14', 'Component': 'mamba', 'Environment': 'QA', 'Error': '404 not found', 'Group': 'Test', 'Job': 'cutting', 'JobType': 'automated'}, {'PartitionKey': '35', 'RowKey': '15', 'Component': 'mamba', 'Environment': 'QA', 'Error': '404 not found', 'Group': 'Test', 'Job': 'cutting', 'JobType': 'automated'}, {'PartitionKey': '36', 'RowKey': '16', 'Component': 'mamba', 'Environment': 'Dev', 'Error': '404 not found', 'Group': 'random', 'Job': 'moping', 'JobType': 'manual'}, {'PartitionKey': '37', 'RowKey': '17', 'Component': 'mamba', 'Environment': 'QA', 'Error': '404 not found', 'Group': 'Test', 'Job': 'cutting', 'JobType': 'automated'}, {'PartitionKey': '38', 'RowKey': '18', 'Component': 'mamba', 'Environment': 'Dev', 'Error': '404 not found', 'Group': 'random', 'Job': 'moping', 'JobType': 'manual'},{'PartitionKey': '39', 'RowKey': '19', 'Component': 'Scorpio', 'Environment': 'Dev', 'Error': '500 internal error', 'Group': 'minerva', 'Job': 'cleaning', 'JobType': 'manual'},{'PartitionKey': '39', 'RowKey': '19', 'Component': 'Scorpio', 'Environment': 'Dev', 'Error': '500 internal error', 'Group': 'minerva', 'Job': 'cleaning', 'JobType': 'manual'}]
Using a python program I am trying to find for each environment, how many types of errors are observed and what is the count. Something like,
{
'QA': {
'404 not found': 10,
'500 internal error': 20,
'503 xyz': 30
},
'DEV': {
'404 not found': 10,
'500 internal error': 20,
'503 xyz': 30
}
}
I am trying to achieve this using Python itertools groupby. Here is the snippet of what I am trying, but I could not achieve exactly what I wanted. Any help will be appreciated
from itertools import groupby
grouped = collections.defaultdict(list)
newgrouped = collections.defaultdict(list)
for item in errors:
grouped[item['Environment']].append(item)
for key, vals in grouped.items():
for val in valss:
newgrouped[group['Error']].append(group)
CodePudding user response:
You can use dict.setdefault
to initialize a non-existing key with a sub-dict where error counts can be kept track of:
from operator import itemgetter
summary = {}
for env, error in map(itemgetter('Environment', 'Error'), errors):
summary.setdefault(env, {})[error] = summary.get(env, {}).get(error, 0) 1
Given your sample input, summary
would become:
{'QA': {'404 not found': 3}, 'Dev': {'404 not found': 2, '500 internal error': 2}}
Demo: https://replit.com/@blhsing/BogusVirtualKnowledge
CodePudding user response:
Seems like what you want is a dict(dict(int))
.
group = defaultdict(dict)
for a in errors:
if not group[a['Environment']]:
group[a['Environment']] = defaultdict(int)
group[a['Environment']][a['Error']] =1
print(group)
CodePudding user response:
I am not familiar with mongodb, just try transferring it to dataframe:
errors_df = pd.DataFrame()
for dict in errors:
errors_df = errors_df.append(dict, ignore_index=True)
errors_env = errors_df.groupby(['Environment', 'Error']).count()
PartitionKey RowKey Component Group Job
Environment Error
Dev 404 not found 2 2 2 2 2
500 internal error 2 2 2 2 2
QA 404 not found 3 3 3 3 3
CodePudding user response:
as for pandas it could be done like this:
import pandas as pd
res = (pd.DataFrame(errors).groupby('Environment')['Error']
.apply(lambda x: x.value_counts().items()).map(dict).to_dict())
>>> res
'''
{'Dev': {'404 not found': 2, '500 internal error': 2}, 'QA': {'404 not found': 3}}