Below I am creating 3 dataframes. df2
and df3
are both nested dataframes of df1
. I am then trying to use .apply()
on all the nested dataframes, and ultimately add a new column to the outer dataframe that is essentially a revised version of the nested dataframes.
I would like to apply the function below to all of the elements (dataframes) that could be found in the 'df_name'
column of df1
. I also need to pass other column values from df1
into the .apply()
function that are on the same row - ie. the value 'sp'
needs to be known when running on the .apply()
function to df2
.
In the attempt below, I would grateful for some insight on:
-how to access the nested dataframes with the .apply()
function and refer to values from the same row/different column of df1
.
-is there a way to approach this using vectorization?
import pandas as pd
cols = ['sales', 'sku']
names = [
[100, 'asdf'],
[200, 'qwer'],
[250, 'zxcv'],
[175, 'yuop']
]
df2 = pd.DataFrame(names, columns = cols)
cols = ['sales', 'sku']
names = [
[80, 'nyer'],
[60, 'cawe']
]
df3 = pd.DataFrame(names, columns = cols)
cols = ['name', 'cmpgn_type', 'df_name']
names = [
['dustin', 'sp', df2],
['jenny', 'sb', df3]
]
df1 = pd.DataFrame(names, columns = cols)
sp_cols_order = ['sales', 'sku', 'Record Type']
sb_cols_order = ['Record_Type', 'sku', 'sales']
def cmpngs(df, type):
df_shape = df.shape[0]
for x in range(df_shape):
df['Record_Type'] = 'hello'
if type == 'sp':
df = df[sp_cols_order]
elif type == 'sb':
df = df[sb_cols_order]
return df
df1['ul_cmpgn'] = df1['df_name'].apply(cmpngs, args=(df1['cmpgn_type'],))
print(df1['ul_cmpgn'].iloc[0])
print(df1['ul_cmpgn'].iloc[1])
expected output for df1:
name cmpgn_type df_name ul_cmpgn
0 dustin sp df2 df2a
1 jenny sb df3 df3a
expected output for df2:
sales sku Record_Type
0 100 asdf hello
1 200 qwer hello
2 250 zxcv hello
3 175 yuop hello
expected output for df3:
Record Type sales sku
0 hello 80 nyer
1 hello 60 cawe
CodePudding user response:
Try changing your cmpngs
function to take a single parameter - row
, and call apply
on the whole dataframe instead of just the df_name
column, and with axis=1
:
def cmpngs(row):
df = row['df_name']
type = row['cmpgn_type']
df_shape = df.shape[0]
for x in range(df_shape):
df['Record Type'] = 'hello'
if type == 'sp':
df = df[sp_cols_order]
elif type == 'sb':
df = df[sb_cols_order]
return df
df1['ul_cmpgn'] = df1.apply(cmpngs, axis=1)
print(df1['ul_cmpgn'].iloc[0])
print(df1['ul_cmpgn'].iloc[1])
Output:
sales sku
0 100 asdf
1 200 qwer
2 250 zxcv
3 175 yuop
sku sales
0 nyer 80
1 cawe 60
You can't really vectorize operations with nested dataframes.