Improvise the apply/lambda for run time-CodePudding

I have two pandas dataframe like this.

df = pd.DataFrame([
        ['ABC0876', 20200103],
        ['DEG9871', 20210201],
        ['USQ5321', 20201101],
        ['RCF2345', 20211004]
    ], columns = ['Key', 'PricingDate'])

master = pd.DataFrame([
        ['ABC0876', 'ZS01', 20190101, 20200130],
        ['ABC0876', 'ZS02', 20200101, 20200213],
        ['DEG9871', 'ZSD1', 20210201, 20220121],
        ['USQ5321', 'ZS01', 20201001, 20220908],
        ['RCF2345', 'ZS02', 20201004, 20211013]
    ], columns = ['Key', 'Condition', 'Valid_From', 'Valid_To'])

The aim is to get unique valid conditions for each key. I tried the following

def Numconditions(Key, PricingDate):
    unqconds = master.loc[(master['Key']==Key) & (master['Valid_To']>=PricingDate),'Condition'].unique()
    return unqconds

df['Unqconds'] = df.apply(lambda row: Numconditions(row['Key'], row['PricingDate']),axis=1)

The dataframes are huge and this runs for a long time. Can anybody suggest quick solution to this that would need less run time?

CodePudding user response：

You can merge your 2 dataframes before check condition and reduce rows into a list of unique values:

out = pd.merge(df, master, on='Key', how='left') \
        .query('Valid_To >= PricingDate') \
        .groupby('Key', sort=False)['Condition'].unique() \
        .rename('Unqconds')

df = df.set_index('Key').join(out).reset_index()

Output:

>>> df
       Key  PricingDate      Unqconds
0  ABC0876     20200103  [ZS01, ZS02]
1  DEG9871     20210201        [ZSD1]
2  USQ5321     20201101        [ZS01]
3  RCF2345     20211004        [ZS02]