Say I have a dataframe containing two columns, my_id
and date
. I want to apply a function that takes those two as arguments but that was not written by me.
Now the function returns a one-row dataframe. I want to assign the row values for a large number of columns in that dataframe back to the original dataframe, either directly or via an intermediate dataframe.
So consider:
import pandas as pd
df=pd.DataFrame([])
df['date']=pd.to_datetime(pd.date_range(start='02/02/2020', periods=2, freq='M'))
df['my_id']='my_id_not_your_id'
display(df) # <---- here are your sample data
my_lambda = lambda x: my_func(x.my_id, report_date=x.date, extra_kwarg=False)
df2 = df.apply(my_lambda, axis=1)
# Now merge df2 back into df (see sample output data)
If I add result_type='expand'
to apply()
it breaks.
Sample input data:
date my_id
0 2020-02-02 afdd094e-5be5-4fd3-b404-5f4b05b81765
1 2020-03-02 afdd094e-5be5-4fd3-b404-5f4b05b81765
Sample output data:
date my_id processed_col_1 processed_col_2 processed_col_3
0 2020-02-02 afdd094e-5be5-4fd3-b404-5f4b05b81765 a c e
1 2020-03-02 afdd094e-5be5-4fd3-b404-5f4b05b81765 b d f
Where
my_func('afdd094e-5be5-4fd3-b404-5f4b05b81765','2020-02-02') = \
pd.DataFrame({'processed_col_1':'a','processed_col_2':'c','processed_col_3':'e'})
CodePudding user response:
If the function is returning a dataframe with one row, then you need to fix the implementation of your my_lambda
func in order to return a series instead of dataframe:
First fix my_lambda
:
my_lambda = lambda x: my_func(x.my_id, report_date=x.date, extra_kwarg=False).iloc[0]
# -------
Then apply
and join
the result with original df:
df.join(df.apply(my_lambda, axis=1))