Home > Net >  Pandas : Executing a function based on the values from two dataframes
Pandas : Executing a function based on the values from two dataframes

Time:05-12

I have two dataframes as shown below:

df1 = 

                        A     B     C      D            
timestamp               
2022-04-08 10:07:00     40    50    NaN    50        
2022-04-08 10:07:01     70    80    80     10        
2022-04-08 10:07:02     200   220   NaN    10         


df2 = 

             A_1       B_1    
                
C            10        10
D            20        10   

The columns C and D of df1 are present as an index in df2. For every row in df1, values in column A and B are noted as a tuple. For example P1 = (A,B). From the df2, for every index, the values in columns A1 and B1 are noted as another tuple. For example P2 = (A1,B1). These tuples are passed to a function as shown below:

def func(P1,P2):
    .....
    ans = (using P1 and P2)
    
    return ans

The answer from the func is updated in df1 in a different column. The desired operation is explained using the dataframe below:

df1 = 

                        A     B     C      D       ans_C                         ans_D      
timestamp               
2022-04-08 10:07:00     40    50    NaN    50     func(P1=(40,50),P2=(10,10))   func(P1=(40,50),p2=(20,10))
2022-04-08 10:07:01     70    80    80     10     func(P1=(70,80),P2=(10,10))   func(P1=(70,80),P2=(20,10)) 
2022-04-08 10:07:02     200   220   NaN    10     func(P1=(200,220),P2=(10,10)) func(P1=(200,220),P2=(20,10))

Is there an easier way to do this?

Thanks in advance!

CodePudding user response:

Here is one approach:

for c in ('C', 'D'):
    P2 = tuple(df2.loc[c])
    df1[f'ans_{c}'] = [func(P1, P2) for P1 in zip(df1['A'], df1['B'])]

CodePudding user response:

def func(P1, P2):
    return sum(P1)   sum(P2)

df1[['ans_C', 'ans_D']] = df1.apply(lambda x: [func((x.A,x.B),(df2.loc[z][y] for y in df2.columns)) for z in df2.index], axis=1, result_type='expand')
print(df1)

Output:

                       A    B     C   D  ans_C  ans_D
timestamp
2022-04-08 10:07:00   40   50   NaN  50  110.0  120.0
2022-04-08 10:07:01   70   80  80.0  10  170.0  180.0
2022-04-08 10:07:02  200  220   NaN  10  440.0  450.0
  • Related