The problem that I am attempting to solve is to reconcile rates applied to the actual accounts shown in "AccountTable" versus what the should be set in "RateTable". For each Account its rate can be set at different levels, either at the Account level or at the Parent level. More than one account can be linked to the same parent but it would differ by currency for example. While I can get it to compare, my solution involves a lot of code which is repeated and its not scalable where this example only looks at 2 different grouping where I could have up to 9 different grouping combo's to compare.
Here is sample AccountTable:
import pandas as pd
import numpy as np
AccountTable = pd.DataFrame([[1234567890,456,'EUR',3.5],
[7854567890,15,'USD',2.7],
[9632587415,56,'GBP',1.4]],
columns = ['Account','ParentID','Cur','Rate'])
AccountTable
Output:
Account ParentID Cur Rate
0 1234567890 456 EUR 3.5
1 7854567890 15 USD 2.7
2 9632587415 56 GBP 1.4
Here is the RateTable:
RateTable = pd.DataFrame([['Account',1234567890,'EUR',3.5], # Rate set at account level and shuold return a match
['ParentID',456,'EUR',3.5], # should be Unused as match found at account level
['ParentID',15,'USD',2.7],# rate set at account level and matches
['ParentID',15,'CAD',1.5],# CAD not in Account Table therfore unused
['Account',9876542190,'EUR',3.5], # Account Table therfore unused
['ParentID',56,'GBP',1.5]], # rate set on parent level but rates don't match so return should be mismatch here
columns = ['Level_Type','ID','Cur','Set_Rate'])
Output:
Level_Type ID Cur Set_Rate
0 Account 1234567890 EUR 3.5
1 ParentID 456 EUR 3.5
2 ParentID 15 USD 2.7
3 ParentID 15 CAD 1.5
4 Account 9876542190 EUR 3.5
5 ParentID 56 GBP 1.5
My solutions is as follows, where I split the RateTable into multiply Dataframes based on the different level. In this case 2 - Account level and Parent Level. Then I left join them independently to the AccountTable using the Groupby function and compare the rates.
option1 = ['Account']
option2 = ['ParentID']
AccountView = RateTable[RateTable['Level_Type'].isin(option1)]
ParentView = RateTable[RateTable['Level_Type'].isin(option2)]
AccountView = AccountView.rename(columns={'Set_Rate':'Account_Set_Rate'})
ParentView = ParentView.rename(columns={'Set_Rate':'Parent_Set_Rate'})
AccountView = AccountView.rename(columns={'ID':'Account_ID'})
ParentView = ParentView.rename(columns={'ID':'Parent_ID'})
# new view to identify matches at Account level Only
df = pd.merge(AccountTable, AccountView, left_on=['Account','Cur'], right_on=['Account_ID','Cur'], how='left')
df['Account_level_RateMatch'] = np.where(df['Rate'] == df['Account_Set_Rate'],'1','0').astype(int)
Account ParentID Cur Rate Level_Type Account_ID Account_Set_Rate Account_level_RateMatch
0 1234567890 456 EUR 3.5 Account 1.234568e 09 3.5 1
1 7854567890 15 USD 2.7 NaN NaN NaN 0
2 9632587415 56 GBP 1.4 NaN NaN NaN 0
The above is repeated but matching on the parent level now:
df = pd.merge(AccountTable, ParentView, left_on=['ParentID','Cur'], right_on=['Parent_ID','Cur'], how='left')
df['Parent_level_RateMatch'] = np.where(df['Rate'] == df['Parent_Set_Rate'],'1','0').astype(int) # compare rates
Output:
Account ParentID Cur Rate Level_Type Parent_ID Parent_Set_Rate Parent_level_RateMatch
0 1234567890 456 EUR 3.5 ParentID 456 3.5 1
1 7854567890 15 USD 2.7 ParentID 15 2.7 1
2 9632587415 56 GBP 1.4 ParentID 56 1.5 0
I need to have a better way to compare the Account Rates to the Rate table rather than doing separate views. Also, the logic needs to be if a match is found at the first level "Account level" its stops there and doesn't need to check the next level i.e. parent level such as in # row 1 it matches at both account and parent level.
Any thoughts or solutions would be greatly appreciated.
Desired Output:
Account ParentID Cur Rate IsMatch LevelFound
0 1234567890 456 EUR 3.5 1 Account
1 7854567890 15 USD 2.7 1 Parent
2 9632587415 56 GBP 1.4 0 Parent
CodePudding user response:
EDIT solution, similar to original one but to fit better OP's expected output
#define the order of the levels
ordered_levels = ['Account','ParentID']
# fnd all the matching rates
res = (
pd.concat(
[AccountTable
.merge(RateTable.loc[RateTable['Level_Type'].eq(lvl),#row with good level
# columns for comparision with AccountTable
['ID','Cur','Set_Rate']]
.rename(columns={'ID':lvl, 'Cur':f'Cur_opt'}),
on=lvl, how='inner')
.query('Cur == Cur_opt') #EDIT to query same cur
.assign(LevelFound=lvl,
#EDIT if rate not the same then 0
Is_Match=lambda x: x['Rate'].eq(x['Set_Rate']).astype(int))
for lvl in ordered_levels]) # do the merge operation on each level
#EDIT for selecting first 1 if any, then first 0
.sort_values('Is_Match', ascending=False)
# keep the first matched per initial AccountTable or higher level non-match
.drop_duplicates(ordered_levels)
[AccountTable.columns.tolist() ['LevelFound','Is_Match']]
)
print(res)
# Account ParentID Cur Rate LevelFound Is_Match
# 0 1234567890 456 EUR 3.5 Account 1
# 1 7854567890 15 USD 2.7 ParentID 1
# 3 9632587415 56 GBP 1.4 ParentID 0
Original solution
Here is a solution, you need first to define the order of the levels, then you can loop over each one, select the rows wanted in RateTable
, then merge
with Account, and keep only the matched cur and rate (query
). concat
all the matched data and keep only the first match per AccountTable initial row.
#define the order of the levels
ordered_levels = ['Account','ParentID']
# fnd all the matching rates
matched = (
pd.concat(
[AccountTable
.merge(RateTable.loc[RateTable['Level_Type'].eq(lvl),#row with good level
# columns for comparision with AccountTable
['ID','Cur','Set_Rate']]
.rename(columns={'ID':lvl, 'Cur':f'Cur_opt'}),
on=lvl, how='inner')
# keep only the matching data
.query('Cur == Cur_opt and Rate == Set_Rate')
# add the two columns for the ouput
.assign(LevelFound=opt, Is_Match=1)
for lvl in ordered_levels]) # do the merge operation on each level
.drop_duplicates(ordered_levels) # keep the first matched per initial AccountTable
[AccountTable.columns.tolist() ['LevelFound','Is_Match']]
)
print(matched) # note that the row wihtout match is missing
# Account ParentID Cur Rate LevelFound Is_Match
# 0 1234567890 456 EUR 3.5 Account 1
# 1 7854567890 15 USD 2.7 ParentID 1
If you want to add the rows with no match, then you can do
res = AccountTable.merge(matched, how='left')
print(res)
# Account ParentID Cur Rate LevelFound Is_Match
# 0 1234567890 456 EUR 3.5 Account 1.0
# 1 7854567890 15 USD 2.7 ParentID 1.0
# 2 9632587415 56 GBP 1.4 NaN NaN