I am new to working with defaultdict
s.
I have a matching script that's places a unique identifier as a "key" and then it puts a list of potential matches for the identifier into a dictionary using a defaultdict(list)
. The matches are company names, addresses, and matching scores (based on matching algorithms). Sometimes it is a 1-1 match, meaning there is 1 key associated with a match, but sometimes the algorithms catches close matches so there are sometimes multiple matches. For those Id like to select this highest scored match.
Goal: Extract data from defaultdict(list) for each unique identifier. If unique identifier has more than 1 value, then exact the data with the highest Lev Score, Fuzzy Score and Jaro score.
Here's a preview of the data:
#imports
from collections import defaultdict
test_dic_stack = defaultdict(list)
#testing data (unique1 has a 1-1 match & unique2 has a 1-5 match)
test_dic_stack['unique1'].append({'Account Name': 'company1', 'Matching Account': 'company1', 'Account_Address': '123 Road', 'Address_match': '123 Road', 'Lev_score': 98.0, 'Fuzzy_score': 100, 'Jaro_Score': 99.0})
test_dic_stack['unique2'].append({'Account Name': 'company1', 'Matching Account': 'company1', 'Account_Address': '1 awesome street', 'Address_match': '1 awesome street', 'Lev_score': 91.0, 'Fuzzy_score': 89, 'Jaro_Score': 99.0})
test_dic_stack['unique2'].append({'Account Name': 'company2', 'Matching Account': 'company2', 'Account_Address': '1 awesome street', 'Address_match': '1 awesome st', 'Lev_score': 71.0, 'Fuzzy_score': 82, 'Jaro_Score': 84.0})
test_dic_stack['unique2'].append({'Account Name': 'company3', 'Matching Account': 'company3', 'Account_Address': '1 awesome street', 'Address_match': '1 awesome street suite 1', 'Lev_score': 88.0, 'Fuzzy_score': 89, 'Jaro_Score': 90.0})
test_dic_stack['unique2'].append({'Account Name': 'company4', 'Matching Account': 'company4', 'Account_Address': '1 awesome street', 'Address_match': '1 awe street', 'Lev_score': 81.0, 'Fuzzy_score': 90, 'Jaro_Score': 86.0})
test_dic_stack['unique2'].append({'Account Name': 'company5', 'Matching Account': 'company5', 'Account_Address': '1 awesome street', 'Address_match': '1 awe st', 'Lev_score': 70.0, 'Fuzzy_score': 86, 'Jaro_Score': 89.0})
#defaultdict preview
defaultdict(list,
{'unique1': [{'Account Name': 'company1',
'Matching Account': 'company1',
'Account_Address': '123 Road',
'Address_match': '123 Road',
'Lev_score': 98.0,
'Fuzzy_score': 100,
'Jaro_Score': 99.0}],
'unique2': [{'Account Name': 'company1',
'Matching Account': 'company1',
'Account_Address': '1 awesome street',
'Address_match': '1 awesome street',
'Lev_score': 91.0,
'Fuzzy_score': 89,
'Jaro_Score': 99.0},
{'Account Name': 'company2',
'Matching Account': 'company2',
'Account_Address': '1 awesome street',
'Address_match': '1 awesome st',
'Lev_score': 71.0,
'Fuzzy_score': 82,
'Jaro_Score': 84.0},
{'Account Name': 'company3',
'Matching Account': 'company3',
'Account_Address': '1 awesome street',
'Address_match': '1 awesome street suite 1',
'Lev_score': 88.0,
'Fuzzy_score': 89,
'Jaro_Score': 90.0},
{'Account Name': 'company4',
'Matching Account': 'company4',
'Account_Address': '1 awesome street',
'Address_match': '1 awe street',
'Lev_score': 81.0,
'Fuzzy_score': 90,
'Jaro_Score': 86.0},
{'Account Name': 'company5',
'Matching Account': 'company5',
'Account_Address': '1 awesome street',
'Address_match': '1 awe st',
'Lev_score': 70.0,
'Fuzzy_score': 86,
'Jaro_Score': 89.0}]})
Here's my requested result:
Extract unique1 data and extract unique2 "best matched" data. Note sometimes the best match isnt always first
results = [{'unique1': {'Account Name': 'company1',
'Matching Account': 'company1',
'Account_Address': '123 Road',
'Address_match': '123 Road',
'Lev_score': 98.0,
'Fuzzy_score': 100,
'Jaro_Score': 99.0},
'unique2': {'Account Name': 'company1',
'Matching Account': 'company1',
'Account_Address': '1 awesome street',
'Address_match': '1 awesome street',
'Lev_score': 91.0,
'Fuzzy_score': 89,
'Jaro_Score': 99.0}]
CodePudding user response:
You could use a dictionary comprehension with max
using the sum of the three scores as key.
Assuming d
the input dictionary.
out = {k:max(v, key=lambda x: sum((x['Fuzzy_score'], x['Lev_score'], x['Jaro_Score'])))
for k,v in d.items()}
Output:
{'unique1': {'Account Name': 'company1',
'Matching Account': 'company1',
'Account_Address': '123 Road',
'Address_match': '123 Road',
'Lev_score': 98.0,
'Fuzzy_score': 100,
'Jaro_Score': 99.0},
'unique2': {'Account Name': 'company1',
'Matching Account': 'company1',
'Account_Address': '1 awesome street',
'Address_match': '1 awesome street',
'Lev_score': 91.0,
'Fuzzy_score': 89,
'Jaro_Score': 99.0}}