I have two dataframes as shown below:
df1 =
A B C D
timestamp
2022-04-08 10:07:00 40 50 NaN 50
2022-04-08 10:07:01 70 80 80 10
2022-04-08 10:07:02 200 220 NaN 10
df2 =
A_1 B_1
C 10 10
D 20 10
The columns C and D of df1
are present as an index in df2
. For every row in df1
, values in column A and B are noted as a tuple. For example P1 = (A,B). From the df2
, for every index, the values in columns A1 and B1 are noted as another tuple. For example P2 = (A1,B1). These tuples are passed to a function as shown below:
def func(P1,P2):
.....
ans = (using P1 and P2)
return ans
The answer from the func
is updated in df1
in a different column. The desired operation is explained using the dataframe below:
df1 =
A B C D ans_C ans_D
timestamp
2022-04-08 10:07:00 40 50 NaN 50 func(P1=(40,50),P2=(10,10)) func(P1=(40,50),p2=(20,10))
2022-04-08 10:07:01 70 80 80 10 func(P1=(70,80),P2=(10,10)) func(P1=(70,80),P2=(20,10))
2022-04-08 10:07:02 200 220 NaN 10 func(P1=(200,220),P2=(10,10)) func(P1=(200,220),P2=(20,10))
Is there an easier way to do this?
Thanks in advance!
CodePudding user response:
Here is one approach:
for c in ('C', 'D'):
P2 = tuple(df2.loc[c])
df1[f'ans_{c}'] = [func(P1, P2) for P1 in zip(df1['A'], df1['B'])]
CodePudding user response:
def func(P1, P2):
return sum(P1) sum(P2)
df1[['ans_C', 'ans_D']] = df1.apply(lambda x: [func((x.A,x.B),(df2.loc[z][y] for y in df2.columns)) for z in df2.index], axis=1, result_type='expand')
print(df1)
Output:
A B C D ans_C ans_D
timestamp
2022-04-08 10:07:00 40 50 NaN 50 110.0 120.0
2022-04-08 10:07:01 70 80 80.0 10 170.0 180.0
2022-04-08 10:07:02 200 220 NaN 10 440.0 450.0