I have the following dataframe where I am trying to match Account Codes. Hypothetically columns Account_Spread_v2 and Account_Codes_v2 were merged into the dataframe. And the idea is to match column Account_Codes_v2 against Account_Codes. See function below to apply this.
df = pd.DataFrame([[31,1234567890,'USD',3.5,'D12',3.5,'D3'],
[10,7854567890,'USD',2.7,'TT',2.7,'TT'],
[10,7854567899,'AUS',8,'D1',8,'D1'],
[6,7854567893,'USD',2.7,'D55',2.7,'H1'],
[10,7854567893,'EUR',2.7,'JG',2.7,'JG'],
[31,9632587415,'USD',1.4,'D55',1.4,'D2']],
columns = ['branch','Account','Cur','Account_Spread','Account_Codes','Account_Spread_v2','Account_Codes_v2'])
Output:
branch Account Cur Account_Spread Account_Codes Account_Spread_v2 Account_Codes_v2
0 31 1234567890 USD 3.5 D12 3.5 D3
1 10 7854567890 USD 2.7 TT 2.7 TT
2 10 7854567899 AUS 8.0 D1 8.0 D1
3 6 7854567893 USD 2.7 D55 2.7 H1
4 10 7854567893 EUR 2.7 JG 2.7 JG
5 31 9632587415 USD 1.4 D55 1.4 D2
Fucntion:
def compute_match_codes(row):
codes = ['D1','D2','D4','D3']
m = 'NA'
if row['Account_Codes'] == row['Account_Codes_v2']:
m = 'MatchOnCodes'
else:
m = 'MismatchOnCodes'
return(m)
df = (pd.concat([df,(df.apply(compute_match_codes, axis=1, result_type='expand')),], axis=1))
branch Account Cur Account_Spread Account_Codes Account_Spread_v2 Account_Codes_v2 0
0 31 1234567890 USD 3.5 D12 3.5 D3 MismatchOnCodes
1 10 7854567890 USD 2.7 TT 2.7 TT MatchOnCodes
2 10 7854567899 AUS 8.0 D1 8.0 D1 MatchOnCodes
3 6 7854567893 USD 2.7 D55 2.7 H1 MismatchOnCodes
4 10 7854567893 EUR 2.7 JG 2.7 JG MatchOnCodes
5 31 9632587415 USD 1.4 D55 1.4 D2 MismatchOnCodes
The challenge I have is that if an account is USD, on branch 31 and its Account Code is either D12 and D55 in column "Account_Codes" it can substitute for any of the codes in the list called "codes".
By applying this row 0 and 5 will actual match then. I tried using the isin() method but isn't working out. Any idea on how to up to edit the function to accommodate this?
CodePudding user response:
I would use a nested np.where()
to eliminate all exact matches first and then ddress the more complex logic you need. I believe this would also be a faster solution as its vectorized than using apply
with concat
and a custom function. The code would look like this:
codes = ['D1','D2','D3','D4']
df['Match'] = np.where(df['Account_Codes'] == df['Account_Codes_v2'],'MatchOnCodes',
np.where((df['Cur'] == 'USD') & (df['branch'] == 31) & (df['Account_Codes'].isin(['D12','D55'])) & (df['Account_Codes_v2'].isin(codes)),'MatchOnCodes','NoMatchOnCodes'))
This outputs:
branch Account Cur ... Account_Spread_v2 Account_Codes_v2 Match
0 31 1234567890 USD ... 3.5 D3 MatchOnCodes
1 10 7854567890 USD ... 2.7 TT MatchOnCodes
2 10 7854567899 AUS ... 8.0 D1 MatchOnCodes
3 6 7854567893 USD ... 2.7 H1 NoMatchOnCodes
4 10 7854567893 EUR ... 2.7 JG MatchOnCodes
5 31 9632587415 USD ... 1.4 D2 MatchOnCodes
Per OPs comment:
codes = ['D1','D2','D3','D4']
def matching_func(row):
if row['Account_Codes'] == row['Account_Codes_v2']:
return 'MatchOnCodes'
elif (row['Cur'] == 'USD') & (row['branch'] == 31) & (row['Account_Codes'] in ['D12','D55']) & (row['Account_Codes_v2'] in codes):
return 'MatchOnCodes'
else:
return 'NoMatchOnCodes'
df['Match'] = df.apply(lambda x: matching_func(x),axis=1)
Output:
branch Account Cur ... Account_Spread_v2 Account_Codes_v2 Match
0 31 1234567890 USD ... 3.5 D3 MatchOnCodes
1 10 7854567890 USD ... 2.7 TT MatchOnCodes
2 10 7854567899 AUS ... 8.0 D1 MatchOnCodes
3 6 7854567893 USD ... 2.7 H1 NoMatchOnCodes
4 10 7854567893 EUR ... 2.7 JG MatchOnCodes
5 31 9632587415 USD ... 1.4 D2 MatchOnCodes