Would it be considered "pythonic" to use a nested defaultdict where bottom level is defaul-CodePudding

I am building something to sort and add values from an API response. I ended up going with an interesting structure, and I just want to make sure there's nothing inherently wrong with it.

from collections import defaultdict

# Helps create a unique nested default dict object
# for code readability
def dict_counter():
    return defaultdict(lambda: 0)

# Creates the nested defaultdict object
ad_data = defaultdict(dict_counter)

# Sorts each instance into its channel, and
# adds the dict values incrimentally
for ad in example:   
    # Collects channel and metrics
    channel = ad['ad_group']['type_']
    metrics = dict(
        impressions= int(ad['metrics']['impressions']),
        clicks     = int(ad['metrics']['clicks']),
        cost       = int(ad['metrics']['cost_micros'])
    )
    
    # Adds the variables
    ad_data[channel]['impressions']  = metrics['impressions']
    ad_data[channel]['clicks']  = metrics['clicks']
    ad_data[channel]['cost']  = metrics['cost']

The output is as desired. Again, I just want to make sure I'm not reinventing the wheel or doing something really inefficient here.

defaultdict(<function __main__.dict_counter()>,
            {'DISPLAY_STANDARD': defaultdict(<function __main__.dict_counter.<locals>.<lambda>()>,
                         {'impressions': 14, 'clicks': 4, 'cost': 9}),
             'SEARCH_STANDARD': defaultdict(<function __main__.dict_counter.<locals>.<lambda>()>,
                         {'impressions': 6, 'clicks': 2, 'cost': 4})})

Here's what my input data would look like:

example = [
    {
        'campaign': 
        {
            'resource_name': 'customers/12345/campaigns/12345',
            'status': 'ENABLED',
            'name': 'test_campaign_2'
        },
        'ad_group': {
            'resource_name': 'customers/12345/adGroups/12345',
            'type_': 'DISPLAY_STANDARD'},
        'metrics': {
            'clicks': '1', 'cost_micros': '3', 'impressions': '5'
        },
        'ad_group_ad': {
            'resource_name': 'customers/12345/adGroupAds/12345~12345',
            'ad': {
                'resource_name': 'customers/12345/ads/12345'
            }
        }
    },
    {
        'campaign': 
        {
            'resource_name': 'customers/12345/campaigns/12345',
            'status': 'ENABLED',
            'name': 'test_campaign_2'
        },
        'ad_group': {
            'resource_name': 'customers/12345/adGroups/12345',
            'type_': 'SEARCH_STANDARD'},
        'metrics': {
            'clicks': '2', 'cost_micros': '4', 'impressions': '6'
        },
        'ad_group_ad': {
            'resource_name': 'customers/12345/adGroupAds/12345~12345',
            'ad': {
                'resource_name': 'customers/12345/ads/12345'
            }
        }
    },
    {
        'campaign': 
        {
            'resource_name': 'customers/12345/campaigns/12345',
            'status': 'ENABLED',
            'name': 'test_campaign_2'
        },
        'ad_group': {
            'resource_name': 'customers/12345/adGroups/12345',
            'type_': 'DISPLAY_STANDARD'},
        'metrics': {
            'clicks': '3', 'cost_micros': '6', 'impressions': '9'
        },
        'ad_group_ad': {
            'resource_name': 'customers/12345/adGroupAds/12345~12345',
            'ad': {
                'resource_name': 'customers/12345/ads/12345'
            }
        }
    }
]

Thanks!

CodePudding user response：

There's nothing wrong with the code you have, but the code for copying the values from one dict to another is a bit repetitive and a little vulnerable to mis-pasting a key name. I'd suggest putting the mapping between the keys in a dict so that there's a single source of truth for what keys you're copying from the input metrics dicts and what keys that data will live under in the output:

fields = {
    # Map input metrics dicts to per-channel metrics dicts.
    'impressions': 'impressions',  # same
    'clicks': 'clicks',            # same
    'cost_micros': 'cost',         # different
}

Since each dict in your output is going to contain the keys from fields.values(), you have the option of creating these as plain dicts with their values initialized to zero rather than as defaultdicts (this doesn't have any major benefits over defaultdict(int), but it does make pretty-printing a bit easier):

# Create defaultdict of per-channel metrics dicts.
ad_data = defaultdict(lambda: dict.fromkeys(fields.values(), 0))

and then you can do a simple nested iteration to populate ad_data:

# Aggregate input metrics into per-channel metrics.
for ad in example:   
    channel = ad['ad_group']['type_']
    for k, v in ad['metrics'].items():
        ad_data[channel][fields[k]]  = int(v)

which for your example input produces:

{'DISPLAY_STANDARD': {'impressions': 14, 'clicks': 4, 'cost': 9},
 'SEARCH_STANDARD': {'impressions': 6, 'clicks': 2, 'cost': 4}}

CodePudding user response：

I think you overthought this one a bit. Consider this simple function that sums two dicts:

def add_dicts(a, b):
    return {
        k: int(a.get(k, 0))   int(b.get(k, 0))
        for k in a | b
    }

Using this func, the main loop gets trivial:

stats = {}

for obj in example:
    t = obj['ad_group']['type_']
    stats[t] = add_dicts(stats.get(t, {}), obj['metrics'])

That's it. No defaultdicts needed.