In pandas, how do I expand into multiple columns an applied function that returns a dataframe?-CodePudding

Say I have a dataframe containing two columns, my_id and date. I want to apply a function that takes those two as arguments but that was not written by me.

Now the function returns a one-row dataframe. I want to assign the row values for a large number of columns in that dataframe back to the original dataframe, either directly or via an intermediate dataframe.

So consider:

import pandas as pd

df=pd.DataFrame([])
df['date']=pd.to_datetime(pd.date_range(start='02/02/2020', periods=2, freq='M'))
df['my_id']='my_id_not_your_id'

display(df) # <---- here are your sample data

my_lambda = lambda x: my_func(x.my_id, report_date=x.date, extra_kwarg=False)
df2 = df.apply(my_lambda, axis=1)

# Now merge df2 back into df (see sample output data)

If I add result_type='expand' to apply() it breaks.

Sample input data:

   date                               my_id
0  2020-02-02  afdd094e-5be5-4fd3-b404-5f4b05b81765
1  2020-03-02  afdd094e-5be5-4fd3-b404-5f4b05b81765

Sample output data:

   date                               my_id          processed_col_1   processed_col_2   processed_col_3
0  2020-02-02  afdd094e-5be5-4fd3-b404-5f4b05b81765 a c e
1  2020-03-02  afdd094e-5be5-4fd3-b404-5f4b05b81765 b d f

Where

my_func('afdd094e-5be5-4fd3-b404-5f4b05b81765','2020-02-02') = \
pd.DataFrame({'processed_col_1':'a','processed_col_2':'c','processed_col_3':'e'})

CodePudding user response：

If the function is returning a dataframe with one row, then you need to fix the implementation of your my_lambda func in order to return a series instead of dataframe:

First fix my_lambda:

my_lambda = lambda x: my_func(x.my_id, report_date=x.date, extra_kwarg=False).iloc[0]
#                                                                             -------

Then apply and join the result with original df:

df.join(df.apply(my_lambda, axis=1))