Home > Software engineering >  Parsing through a pandas Dataframe and applying rules based on different conditions
Parsing through a pandas Dataframe and applying rules based on different conditions

Time:11-12

I have a made up dataframe to replicate the real problem I am trying to work through in python in reconciling account rates held on mainframe system to what they should be set to from rates tables.

I have 3 tables but they have been merged into one dataframe for this example.

  1. Account Information with Interest Rate Conditions (first 5 columns of df). These rates are the actual rates applied to the accounts and need to be matched off to ensure they were set up correctly
  2. Non Std Rates - Certain accounts will apply these non-std rates once certain conditions are met
  3. Std Rates- Same as above these will apply once certain conditions are met
import pandas as pd
import numpy as np
df = pd.DataFrame([[1234567890,3.5,'GG','N','N','Y',np.NaN,np.NaN,'N','N',3.5,'GG'],
                    [7854567890,np.NaN,'GG','N','N','N',np.NaN,'GG','N','N',3.5,'GG'],
                    [9876542190,3.5,'FF','N','N','Y',np.NaN,np.NaN,'N','Y',3.5,'FI'],
                    [9632587415,3.5,'GG','N','N','N',3,'GG','N','N',3.5,'GG']],
columns = ['Account','Account_Spread','Account_Swing','indict_1','indict_2','Negotiated_Rate',
           'Non_std_Spread','Non_std_Code','Non_std_indict_1','Non_std_indict_2','Std_Spread','Std_Swing'])
df

Conditions:

  • Account Data (Account Spread and Account Swing) should only be matched against the non-std rates if columns "Non_std_Spread" or "Non_std_Code" or both are populated and the "Negogiated_Rate" column is set to N.
  • Account Data (Spread and Swing) should only be matched against the std rates if columns "Non_std_Spread" or "Non_std_Code" are both null and the "Negogiated_Rate" column is set to N or Y.
  • for those accounts where the above indicator is set to Y, the indicator in the non-std data "Non_std_indict_1"and "Non_std_indict_2" need to be compared to "indict_1" and "indict_2" respectively and report matches and mismatches.

The Desired Outcome:

  • A new column added to dataframe identifying if a match or mismatch was detected comparing the account spread and code to either its equivalent in the non-std rates or std rates. Something like "MatchOnNSR" or "MismatchOnSR".
  • Another column or columns comparing if a mismatch occurred between the indicator columns when the Negogiated_Rate was flagged as Y

Sample Output with expected results:

df=pd.DataFrame([[1234567890,3.5,'GG','N','N','Y',np.NaN,np.NaN,'N','N',3.5,'GG','MatchOnSR',True,True],
                    [7854567890,np.NaN,'GG','N','N','N',np.NaN,'GG','N','N',3.5,'GG','MatchOnNSR',np.NaN,np.NaN],
                    [9876542190,3.5,'FF','N','N','Y',np.NaN,np.NaN,'N','Y',3.5,'FI','MismatchOnSR',True,False],
                    [9632587415,3.5,'GG','N','N','N',3,'GG','N','N',3.5,'GG','MismatchOnSNR',np.NaN,np.NaN]],
columns = ['Account','Account_Spread','Account_Swing','indict_1','indict_2','Negotiated_Rate',
           'Non_std_Spread','Non_std_Code','Non_std_indict_1','Non_std_indict_2','Std_Spread','Std_Swing','Is_Match','Match_indict_1','Match_indict_2'])
df

At present I don't have anything to share to work this problem out. I'm struggling to know what the best approach would be to get even started. Any help is much appreciated.

CodePudding user response:

Finally got it:

def compute_match(row):
    m = match_indict_1 = match_indict_2 = np.nan

    if row['Non_std_Spread'] == 'nan' and row['Non_std_Code'] == 'nan':
        match_indict_1 = row['indict_1'] == row['Non_std_indict_1']
        match_indict_2 = row['indict_2'] == row['Non_std_indict_2']
        if row['Account_Spread'] == row['Std_Spread'] and row['Account_Swing'] == row['Std_Swing']:
            m = 'MatchOnSR'
        else:
            m = 'MismatchOnSR'

    elif row['Non_std_Spread'] != 'nan' and row['Non_std_Code'] != 'nan' and row['Negotiated_Rate'] == 'N':
        match_indict_1 = match_indict_2 = np.nan
        if row['Account_Spread'] == row['Non_std_Spread'] and row['Account_Swing'] == row['Non_std_Code']:
            m = 'MatchOnNSR'
        else:
            m = 'MismatchOnNSR'

    return (m, match_indict_1, match_indict_2)


df = (
    pd.concat([
        df,
        (
            df
            .fillna('nan')
            .apply(compute_match, axis=1, result_type='expand')
            .rename({0:'Is_Match', 1:'Match_indict_1', 2:'Match_indict_2'}, axis=1)
        ),
    ], axis=1)
)

Test:

      Account  Account_Spread Account_Swing indict_1 indict_2 Negotiated_Rate  Non_std_Spread Non_std_Code Non_std_indict_1 Non_std_indict_2  Std_Spread Std_Swing       Is_Match Match_indict_1 Match_indict_2
0  1234567890             3.5            GG        N        N               Y             NaN          NaN                N                N         3.5        GG      MatchOnSR           True           True
1  7854567890             NaN            GG        N        N               N             NaN           GG                N                N         3.5        GG            NaN            NaN            NaN
2  9876542190             3.5            FF        N        N               Y             NaN          NaN                N                Y         3.5        FI   MismatchOnSR           True          False
3  9632587415             3.5            GG        N        N               N             3.0           GG                N                N         3.5        GG  MismatchOnNSR            NaN            NaN

Notice that Is_Match row #2 is NaN - that's becuse Non_std_Spread row #2 is NaN but Non_std_Code row #2 is not NaN.

CodePudding user response:

In a pure Pandas way, but not too much idiomatic and may be not so efficient:

nandf = df.query("Non_std_Spread.isna() and Non_std_Code.isna()")
nandf["match_indict_1"] = nandf["indict_1"] == nandf["Non_std_indict_1"]
nandf["match_indict_2"] = nandf["indict_2"] == nandf["Non_std_indict_2"]
nandf["Is_Match"] = np.where(
    (nandf["Account_Spread"] == nandf["Std_Spread"]) & (nandf["Account_Swing"] == nandf["Std_Swing"]),
    "MatchOnSR", "MismatchOnNSR",
)

nonandf = df.query("not(Non_std_Spread.isna()) and not(Non_std_Code.isna())")
nonandf["Is_Match"] = np.where(
    (nonandf["Account_Spread"] == nonandf["Non_std_Spread"]) & (nonandf["Account_Swing"] == nonandf["Non_std_Code"]),
    "MatchOnSR", "MismatchOnNSR",
)

df = nandf.combine_first(df)
df = nonandf.combine_first(df)
  • Related