How to write this GroupBy-Apply-Aggregate in one line?-CodePudding

Is there a better way to write this such that I don't need to add a new column to the existing dataframe?

## Sample Data -- df_big
Groups, TotalOwned
0, 1
0, 5
1, 3
2, 2
2, 1

## My Code
df_mult = df_big[['Groups', 'TotalOwned']].copy()
df_mult['Multiple'] = np.where(df_mult['TotalOwned'] > 1, 1, 0)
df_mult.groupby('Groups')['Multiple'].mean().sort_index()

## Output
Groups
0    0.162074
1    0.996627
2    0.133447
3    0.097553

CodePudding user response：

You could try using DataFrame.apply(). If your "Groups" column wasn't numerical, you could try something like:

def data_transform(df_column):

    binarize_func = lambda x: 1 if x > 1 else 0

    # if all elements are numerical, apply function to all
    if df_column.astype(dtype=str).str.isnumeric().all():
        return df_column.apply(binarize_func)
    else:
        return df_column

df.apply(data_transform).groupby('Groups')['Multiple'].mean().sort_index()

Alternatively, you could define a custom name in the function:

def data_transform(df_column):
    binarize_func = lambda x: 1 if x > 1 else 0

    # Only apply the function if column name is not "Groups"
    if df_column.name != "Groups":
        return df_column.apply(binarize_func)
    else:
        return df_column

df.apply(data_transform).groupby('Groups')['Multiple'].mean().sort_index()

CodePudding user response：

your data and the result don't match. However, just focusing on making these three lines into a single line. this is one way to accomplish it

df.assign(mult=np.where(df['TotalOwned'] > 1, 1, 0)
         ).groupby('Groups')['mult'].mean()

Result, based on the provided data and the code to be combined into a single line

Groups
0    0.5
1    1.0
2    0.5
Name: mult, dtype: float64

CodePudding user response：

IIUC you can do in one-line:

df_out = df.assign(TotalOwned=(df['TotalOwned'] > 1)).groupby('Groups').mean()