Home > Software design >  Matching Zeros and Nulls as equal in dataframes
Matching Zeros and Nulls as equal in dataframes

Time:02-08

I have the following dataframe/function which reconciles interest rates and codes between a mainframe system and its rate depositary/warehouse.

import pandas as pd
import numpy as np

# Test Dataframe
df = pd.DataFrame([[5551455514,3.5,'GG',3.5,'GG',],
                    [9631475400,0,'GG',np.NaN,'GG',],
                   [8529875405,0,'GG',0,'GG',],
                    [3265415409,np.NaN,'GG',0,'GG',],
                    [9631475405,1.25,'GG',0,'GG',]],
columns = ['Account','Mainframe_System_interest','Mainframe_System_code','Rate_warehouse_interest','Rate_warehouse_code'])

# Function 
def matching_func(row):
    if row['Mainframe_System_code'] == row['Rate_warehouse_code'] and row['Mainframe_System_interest'] == row['Rate_warehouse_interest']:
        return 'Match'
    else:
        return 'Mismatch'
# New generated Column
df['Rate_Match'] = df.apply(lambda x: matching_func(x), axis=1)

# Current Output:

Account Mainframe_System_interest   Mainframe_System_code   Rate_warehouse_interest Rate_warehouse_code Rate_Match
0       5551455514  3.50            GG                      3.5                   GG      Match
1       9631475400  0.00            GG                      NaN                   GG      Mismatch
2       8529875405  0.00            GG                      0.0                   GG      Match
3       3265415409  NaN             GG                      0.0                   GG      Mismatch
4       9631475405  1.25            GG                      0.0                   GG      Mismatch

When matching the interest rates, I need to add in the factor that if the system has a null rate this can be treated as a zero and vice versa. The same concept is applied to the rate warehouse too. Currently function can only match like for like.

I need to factor in where 0s can be nulls and vice versa. Any ideas how to add this to the function provided?

The expected result should see Line 1 and 3 as matched and not mismatched, and the others should not change.

Any thoughts how to achieve this?

CodePudding user response:

Use fillna before:

m1 = df['Mainframe_System_interest'].fillna(df['Rate_warehouse_interest'])
         == df['Rate_warehouse_interest'].fillna(df['Mainframe_System_interest'])

m2 = df['Mainframe_System_code'].fillna(df['Rate_warehouse_code']) \
         == df['Rate_warehouse_code'].fillna(df['Mainframe_System_code'])

df['Rate_Match'] = np.where(m1 & m2, 'Match', 'Mismatch')

Output:

>>> df[['Account', 'Rate_Match']]
      Account Rate_Match
0  5551455514      Match
1  9631475400      Match
2  8529875405      Match
3  3265415409      Match
4  9631475405   Mismatch

>>> df
      Account  Mainframe_System_interest Mainframe_System_code  Rate_warehouse_interest Rate_warehouse_code Rate_Match
0  5551455514                       3.50                    GG                      3.5                  GG      Match
1  9631475400                       0.00                    GG                      NaN                  GG      Match
2  8529875405                       0.00                    GG                      0.0                  GG      Match
3  3265415409                        NaN                    GG                      0.0                  GG      Match
4  9631475405                       1.25                    GG                      0.0                  GG   Mismatch
  •  Tags:  
  • Related