Home > Back-end >  Matching Columns in a dataframe and factoring in a List
Matching Columns in a dataframe and factoring in a List

Time:11-17

I have the following dataframe where I am trying to match Account Codes. Hypothetically columns Account_Spread_v2 and Account_Codes_v2 were merged into the dataframe. And the idea is to match column Account_Codes_v2 against Account_Codes. See function below to apply this.

df = pd.DataFrame([[31,1234567890,'USD',3.5,'D12',3.5,'D3'],
                    [10,7854567890,'USD',2.7,'TT',2.7,'TT'],
                    [10,7854567899,'AUS',8,'D1',8,'D1'],
                    [6,7854567893,'USD',2.7,'D55',2.7,'H1'],
                    [10,7854567893,'EUR',2.7,'JG',2.7,'JG'],      
                    [31,9632587415,'USD',1.4,'D55',1.4,'D2']],
columns = ['branch','Account','Cur','Account_Spread','Account_Codes','Account_Spread_v2','Account_Codes_v2'])

Output:

branch  Account     Cur Account_Spread  Account_Codes   Account_Spread_v2   Account_Codes_v2
0   31  1234567890  USD 3.5             D12             3.5                 D3
1   10  7854567890  USD 2.7             TT              2.7                 TT
2   10  7854567899  AUS 8.0             D1              8.0                 D1
3   6   7854567893  USD 2.7             D55             2.7                 H1
4   10  7854567893  EUR 2.7             JG              2.7                 JG
5   31  9632587415  USD 1.4             D55             1.4                 D2

Fucntion:

def compute_match_codes(row):
    codes = ['D1','D2','D4','D3']
    m = 'NA'
    if row['Account_Codes'] == row['Account_Codes_v2']:
        m = 'MatchOnCodes'
    else:
        m = 'MismatchOnCodes'
    return(m)
df = (pd.concat([df,(df.apply(compute_match_codes, axis=1, result_type='expand')),], axis=1))

branch  Account     Cur Account_Spread  Account_Codes   Account_Spread_v2   Account_Codes_v2 0
0   31  1234567890  USD 3.5             D12             3.5                 D3         MismatchOnCodes
1   10  7854567890  USD 2.7             TT              2.7                 TT           MatchOnCodes
2   10  7854567899  AUS 8.0             D1              8.0                 D1         MatchOnCodes
3   6   7854567893  USD 2.7             D55             2.7                 H1         MismatchOnCodes
4   10  7854567893  EUR 2.7             JG              2.7                 JG        MatchOnCodes
5   31  9632587415  USD 1.4             D55             1.4                 D2     MismatchOnCodes

The challenge I have is that if an account is USD, on branch 31 and its Account Code is either D12 and D55 in column "Account_Codes" it can substitute for any of the codes in the list called "codes".

By applying this row 0 and 5 will actual match then. I tried using the isin() method but isn't working out. Any idea on how to up to edit the function to accommodate this?

CodePudding user response:

I would use a nested np.where() to eliminate all exact matches first and then ddress the more complex logic you need. I believe this would also be a faster solution as its vectorized than using apply with concat and a custom function. The code would look like this:

codes = ['D1','D2','D3','D4']
df['Match'] = np.where(df['Account_Codes'] == df['Account_Codes_v2'],'MatchOnCodes',
                       np.where((df['Cur'] == 'USD') & (df['branch'] == 31) & (df['Account_Codes'].isin(['D12','D55'])) & (df['Account_Codes_v2'].isin(codes)),'MatchOnCodes','NoMatchOnCodes'))

This outputs:

   branch     Account  Cur  ...  Account_Spread_v2 Account_Codes_v2           Match
0      31  1234567890  USD  ...                3.5               D3    MatchOnCodes
1      10  7854567890  USD  ...                2.7               TT    MatchOnCodes
2      10  7854567899  AUS  ...                8.0               D1    MatchOnCodes
3       6  7854567893  USD  ...                2.7               H1  NoMatchOnCodes
4      10  7854567893  EUR  ...                2.7               JG    MatchOnCodes
5      31  9632587415  USD  ...                1.4               D2    MatchOnCodes

Per OPs comment:

codes = ['D1','D2','D3','D4']
def matching_func(row):
  if row['Account_Codes'] == row['Account_Codes_v2']:
    return 'MatchOnCodes'
  elif (row['Cur'] == 'USD') & (row['branch'] == 31) & (row['Account_Codes'] in ['D12','D55']) & (row['Account_Codes_v2'] in codes):
    return 'MatchOnCodes'
  else:
    return 'NoMatchOnCodes'
df['Match'] = df.apply(lambda x: matching_func(x),axis=1)

Output:

   branch     Account  Cur  ...  Account_Spread_v2 Account_Codes_v2           Match
0      31  1234567890  USD  ...                3.5               D3    MatchOnCodes
1      10  7854567890  USD  ...                2.7               TT    MatchOnCodes
2      10  7854567899  AUS  ...                8.0               D1    MatchOnCodes
3       6  7854567893  USD  ...                2.7               H1  NoMatchOnCodes
4      10  7854567893  EUR  ...                2.7               JG    MatchOnCodes
5      31  9632587415  USD  ...                1.4               D2    MatchOnCodes
  • Related