Home > Software engineering >  Using .apply() on a dataframe of dataframes
Using .apply() on a dataframe of dataframes

Time:03-15

Below I am creating 3 dataframes. df2 and df3 are both nested dataframes of df1. I am then trying to use .apply() on all the nested dataframes, and ultimately add a new column to the outer dataframe that is essentially a revised version of the nested dataframes.

I would like to apply the function below to all of the elements (dataframes) that could be found in the 'df_name' column of df1. I also need to pass other column values from df1 into the .apply() function that are on the same row - ie. the value 'sp' needs to be known when running on the .apply() function to df2.

In the attempt below, I would grateful for some insight on: -how to access the nested dataframes with the .apply() function and refer to values from the same row/different column of df1. -is there a way to approach this using vectorization?

import pandas as pd

cols = ['sales', 'sku']
names = [
    [100, 'asdf'],
    [200, 'qwer'],
    [250, 'zxcv'],
    [175, 'yuop']
]
df2 = pd.DataFrame(names, columns = cols)


cols = ['sales', 'sku']
names = [
    [80, 'nyer'],
    [60, 'cawe']
]
df3 = pd.DataFrame(names, columns = cols)


cols = ['name', 'cmpgn_type', 'df_name']
names = [
    ['dustin', 'sp', df2],
    ['jenny', 'sb', df3]
]
df1 = pd.DataFrame(names, columns = cols)


sp_cols_order = ['sales', 'sku', 'Record Type']
sb_cols_order = ['Record_Type', 'sku', 'sales']


def cmpngs(df, type):
    df_shape = df.shape[0]
    for x in range(df_shape):
        df['Record_Type'] = 'hello'
        if type == 'sp':
            df = df[sp_cols_order]
        elif type == 'sb':
            df = df[sb_cols_order]
    return df


df1['ul_cmpgn'] = df1['df_name'].apply(cmpngs, args=(df1['cmpgn_type'],))

print(df1['ul_cmpgn'].iloc[0])
print(df1['ul_cmpgn'].iloc[1])

expected output for df1:

     name cmpgn_type df_name ul_cmpgn
0  dustin         sp     df2     df2a
1   jenny         sb     df3     df3a

expected output for df2:

   sales   sku Record_Type
0    100  asdf       hello
1    200  qwer       hello
2    250  zxcv       hello
3    175  yuop       hello

expected output for df3:

  Record Type  sales   sku
0       hello     80  nyer
1       hello     60  cawe

CodePudding user response:

Try changing your cmpngs function to take a single parameter - row, and call apply on the whole dataframe instead of just the df_name column, and with axis=1:

def cmpngs(row):
    df = row['df_name']
    type = row['cmpgn_type']
    df_shape = df.shape[0]
    for x in range(df_shape):
        df['Record Type'] = 'hello'
        if type == 'sp':
            df = df[sp_cols_order]
        elif type == 'sb':
            df = df[sb_cols_order]
    return df

df1['ul_cmpgn'] = df1.apply(cmpngs, axis=1)

print(df1['ul_cmpgn'].iloc[0])
print(df1['ul_cmpgn'].iloc[1])

Output:

   sales   sku
0    100  asdf
1    200  qwer
2    250  zxcv
3    175  yuop

    sku  sales
0  nyer     80
1  cawe     60

You can't really vectorize operations with nested dataframes.

  • Related