Home > Enterprise >  Python function looping through a dictionary of sets removing and adding items updating only some it
Python function looping through a dictionary of sets removing and adding items updating only some it

Time:07-04

Thanks in advance for your assistance.

I have two dictionaries: parent_dict has a series of sets as values & pub_plat_dict is a look-up dictionary that seeks to correct the names of the items that make up the sets in parent_dict.

The function update_dict allows names where it finds a '.' to pass. If it doesn't find a name then it will try to find the name in the pub_plat_dict. If found it will .remove the old name and .add the updated name. If the name isn't present, then I want the program to move to the next item.

When I run the function update_dict corrects the first item in the parent_dict, accurately skips multiple items that don't need to be updated but then doesn't .remove or .add the other wrongly named items.

Sample Data

parent_dict = {
    '49d238407e0102ba':{'opportunity_history'}
    , 'f9d53c74ec1d2ff6':{'servicer.trial_balance','src_platform.loan','src_platform.loan_disbursement'}
    , 'fc35a98e0cfaab3d':{'loan', 'loan_agreement', 'opportunity_compliance_flag','zodiac'}
}

pub_plat_dict = {'loan':'src_platform.loan',
              'opportunity_compliance_flag':'src_platform.opportunity_compliance_flag',
              'opportunity_history':'src_platform.opportunity_history',
              'loan_agreement': 'src_platform_mosaic_live.loan_agreement'}

Function

def update_dict(parent_dict):
    for tbls in parent_dict.values():
        for tbl in tbls:
            if tbl.find(".") != -1:
                pass
            else:
                try:
                    update = pub_plat_dict[tbl]
                    tbls.remove(tbl)
                    tbls.add(update)
                except:
                    pass
            return(parent_dict)

Output

{'49d238407e0102ba': {'src_platform.opportunity_history'}, 'f9d53c74ec1d2ff6': {'src_platform.loan', 'src_platform.loan_disbursement', 'servicer.trial_balance'}, 'fc35a98e0cfaab3d': {'opportunity_compliance_flag', 'loan_agreement', 'loan', 'zodiac'}}

NOTE: the first item is updated correctly but everything else is left unchanged.

I did the following loop to try to figure out my error (keeping it as close to the update_dict code as I could).

for tbls in parent_dict.values():
    for tbl in tbls:
        if tbl.find('.') != -1:
            print("UNCHANGED-"   tbl)
        else:
            try:
                print("CHANGED-"   pub_plat_dict[tbl])
            except:
                print("FAILURE-"  tbl)

It gives me the following output:

UNCHANGED-src_platform.opportunity_history
UNCHANGED-src_platform.loan
UNCHANGED-src_platform.loan_disbursement
UNCHANGED-servicer.trial_balance
CHANGED-src_platform.opportunity_compliance_flag
CHANGED-src_platform_mosaic_live.loan_agreement
CHANGED-src_platform.loan
FAILURE-zodiac

Aside from the capitalized- word this is what I would expect the parent_dict would now look like. So my .remove and .add aren't working consistently.

EDIT: I also substituted .discard for .remove, the output did not change.

Any assistance would be greatly appreciated.

CodePudding user response:

I couldn't get the function to work as I created it. Part of the issue is where the return statement appears. Using a return inside of a loop will break it and exit the function even if the iteration is still not finished. Another issue might be the redundant logic in the function.

I decided to create a new dictionary and use a list as the dict's values instead of sets. I simplified the logic in the function and got rid of the if statement which seemed redundant in light of the try clause.

from collections import defaultdict

cln_parent_dict = defaultdict(list)

def update_dict(parent_dict):
    for key in parent_dict:
        for value in parent_dict[key]:
            try:
                cln_parent_dict[key].append(pub_plat_dict[value])
            except:
                cln_parent_dict[key].append(value)    
    return(cln_parent_dict) 

when I run the function I get what I expect:

Function Output

defaultdict(<class 'list'>, {'49d238407e0102ba': ['src_platform.opportunity_history'], 'f9d53c74ec1d2ff6': ['servicer.trial_balance', 'src_platform.loan_disbursement', 'src_platform.loan'], 'fc35a98e0cfaab3d': ['zodiac', 'src_platform_mosaic_live.loan_agreement', 'src_platform.opportunity_compliance_flag', 'src_platform.loan']})

Overall the change seems to work for the 100K items in the dataset.

Thanks to everyone for taking a look.

  • Related