I'm trying to apply a function row-by-row which takes 5 inputs, 3 of which are lists. I want these lists to come from each row of 3 correspondings dataframes.
I've tried using 'apply' and 'lambda' as follows:
sol['tf_dd']=sol.apply(lambda tsol, rfsol, rbsol:
taurho_difdif(xy=xy,
l=l,
t=tsol,
rf=rfsol,
rb=rbsol),
axis=1)
However I get the error <lambda>() missing 2 required positional arguments: 'rfsol' and 'rbsol'
The DataFrame sol
and the DataFrames tsol
, rfsol
and rbsol
all have the same length. For each row, I want the entire row from tsol
, rfsol
and rbsol
to be input as three lists.
Here is much simplified example (first with single lists, which I then want to replicate row by row with dataframes):
The output with single lists is a single value (120). With dataframes as inputs I want an output dataframe of length 10 where all values are 120.
t=[1,2,3,4,5]
rf=[6,7,8,9,10]
rb=[11,12,13,14,15]
def simple_func(t, rf, rb):
x=sum(t)
y=sum(rf)
z=sum(rb)
return x y z
out=simple_func(t,rf,rb)
# dataframe rows as lists
tsol=pd.DataFrame((t,t,t,t,t,t,t,t,t,t))
rfsol=pd.DataFrame((rf,rf,rf,rf,rf,rf,rf,rf,rf,rf))
rbsol=pd.DataFrame((rb,rb,rb,rb,rb,rb,rb,rb,rb,rb))
out2 = pd.DataFrame(index=range(len(tsol)), columns=['output'])
out2['output'] = out2.apply(lambda tsol, rfsol, rbsol:
simple_func(t=tsol.tolist(),
rf=rfsol.tolist(),
rb=rbsol.tolist()),
axis=1)
CodePudding user response:
Try to use "name" field in Series Type to get index value, and then get the same index for the other DataFrame
import pandas as pd
import numpy as np
def postional_sum(inot, df1, df2, df3):
"""
Get input index and gather the same position for the other DataFrame collection
"""
position = inot.name
x = df1.iloc[position].sum()
y = df2.iloc[position].sum()
z = df3.iloc[position].sum()
return x y z
# dataframe rows as lists
tsol = pd.DataFrame(np.random.randn(10, 5), columns=range(5))
rfsol = pd.DataFrame(np.random.randn(10, 5), columns=range(5))
rbsol = pd.DataFrame(np.random.randn(10, 5), columns=range(5))
out2 = pd.DataFrame(index=range(len(tsol)), columns=["output"])
out2["output"] = out2.apply(lambda x: postional_sum(x, tsol, rfsol, rbsol), axis=1)
out2
Hope this helps!
CodePudding user response:
When you run df.apply()
with axis=1
, it does not pass on the columns as individual arguments to the function, but as a Series object, as explained here. The correct way to do this would be
out2['output'] = out2.apply(lambda row:
simple_func(t=row["tsol"],
rf=row["rfsol"],
rb=row["rbsol"]),
axis=1)
CodePudding user response:
You can eliminate the simple function using this:
out2["output"] = tsol.sum(axis=1) rfsol.sum(axis=1) rbsol.sum(axis=1)
Here is the complete code:
t=[1,2,3,4,5]
rf=[6,7,8,9,10]
rb=[11,12,13,14,15]
# dataframe rows as lists
tsol=pd.DataFrame((t,t,t,t,t,t,t,t,t,t))
rfsol=pd.DataFrame((rf,rf,rf,rf,rf,rf,rf,rf,rf,rf))
rbsol=pd.DataFrame((rb,rb,rb,rb,rb,rb,rb,rb,rb,rb))
out2 = pd.DataFrame(index=range(len(tsol)), columns=["output"])
out2["output"] = tsol.sum(axis=1) rfsol.sum(axis=1) rbsol.sum(axis=1)
print(out2)
OUTPUT:
output
0 120
1 120
2 120
3 120
4 120
5 120
6 120
7 120
8 120
9 120