Thanks in advance for your assistance.
I have two dictionaries: parent_dict has a series of sets as values & pub_plat_dict is a look-up dictionary that seeks to correct the names of the items that make up the sets in parent_dict.
The function update_dict allows names where it finds a '.' to pass. If it doesn't find a name then it will try to find the name in the pub_plat_dict. If found it will .remove the old name and .add the updated name. If the name isn't present, then I want the program to move to the next item.
When I run the function update_dict corrects the first item in the parent_dict, accurately skips multiple items that don't need to be updated but then doesn't .remove or .add the other wrongly named items.
Sample Data
parent_dict = {
'49d238407e0102ba':{'opportunity_history'}
, 'f9d53c74ec1d2ff6':{'servicer.trial_balance','src_platform.loan','src_platform.loan_disbursement'}
, 'fc35a98e0cfaab3d':{'loan', 'loan_agreement', 'opportunity_compliance_flag','zodiac'}
}
pub_plat_dict = {'loan':'src_platform.loan',
'opportunity_compliance_flag':'src_platform.opportunity_compliance_flag',
'opportunity_history':'src_platform.opportunity_history',
'loan_agreement': 'src_platform_mosaic_live.loan_agreement'}
Function
def update_dict(parent_dict):
for tbls in parent_dict.values():
for tbl in tbls:
if tbl.find(".") != -1:
pass
else:
try:
update = pub_plat_dict[tbl]
tbls.remove(tbl)
tbls.add(update)
except:
pass
return(parent_dict)
Output
{'49d238407e0102ba': {'src_platform.opportunity_history'}, 'f9d53c74ec1d2ff6': {'src_platform.loan', 'src_platform.loan_disbursement', 'servicer.trial_balance'}, 'fc35a98e0cfaab3d': {'opportunity_compliance_flag', 'loan_agreement', 'loan', 'zodiac'}}
NOTE: the first item is updated correctly but everything else is left unchanged.
I did the following loop to try to figure out my error (keeping it as close to the update_dict code as I could).
for tbls in parent_dict.values():
for tbl in tbls:
if tbl.find('.') != -1:
print("UNCHANGED-" tbl)
else:
try:
print("CHANGED-" pub_plat_dict[tbl])
except:
print("FAILURE-" tbl)
It gives me the following output:
UNCHANGED-src_platform.opportunity_history
UNCHANGED-src_platform.loan
UNCHANGED-src_platform.loan_disbursement
UNCHANGED-servicer.trial_balance
CHANGED-src_platform.opportunity_compliance_flag
CHANGED-src_platform_mosaic_live.loan_agreement
CHANGED-src_platform.loan
FAILURE-zodiac
Aside from the capitalized- word this is what I would expect the parent_dict would now look like. So my .remove and .add aren't working consistently.
EDIT: I also substituted .discard for .remove, the output did not change.
Any assistance would be greatly appreciated.
CodePudding user response:
I couldn't get the function to work as I created it. Part of the issue is where the return
statement appears. Using a return
inside of a loop will break it and exit the function even if the iteration is still not finished. Another issue might be the redundant logic in the function.
I decided to create a new dictionary and use a list as the dict's values instead of sets. I simplified the logic in the function and got rid of the if
statement which seemed redundant in light of the try
clause.
from collections import defaultdict
cln_parent_dict = defaultdict(list)
def update_dict(parent_dict):
for key in parent_dict:
for value in parent_dict[key]:
try:
cln_parent_dict[key].append(pub_plat_dict[value])
except:
cln_parent_dict[key].append(value)
return(cln_parent_dict)
when I run the function I get what I expect:
Function Output
defaultdict(<class 'list'>, {'49d238407e0102ba': ['src_platform.opportunity_history'], 'f9d53c74ec1d2ff6': ['servicer.trial_balance', 'src_platform.loan_disbursement', 'src_platform.loan'], 'fc35a98e0cfaab3d': ['zodiac', 'src_platform_mosaic_live.loan_agreement', 'src_platform.opportunity_compliance_flag', 'src_platform.loan']})
Overall the change seems to work for the 100K items in the dataset.
Thanks to everyone for taking a look.