Is there a better way to write this such that I don't need to add a new column to the existing dataframe?
## Sample Data -- df_big
Groups, TotalOwned
0, 1
0, 5
1, 3
2, 2
2, 1
## My Code
df_mult = df_big[['Groups', 'TotalOwned']].copy()
df_mult['Multiple'] = np.where(df_mult['TotalOwned'] > 1, 1, 0)
df_mult.groupby('Groups')['Multiple'].mean().sort_index()
## Output
Groups
0 0.162074
1 0.996627
2 0.133447
3 0.097553
CodePudding user response:
You could try using DataFrame.apply(). If your "Groups" column wasn't numerical, you could try something like:
def data_transform(df_column):
binarize_func = lambda x: 1 if x > 1 else 0
# if all elements are numerical, apply function to all
if df_column.astype(dtype=str).str.isnumeric().all():
return df_column.apply(binarize_func)
else:
return df_column
df.apply(data_transform).groupby('Groups')['Multiple'].mean().sort_index()
Alternatively, you could define a custom name in the function:
def data_transform(df_column):
binarize_func = lambda x: 1 if x > 1 else 0
# Only apply the function if column name is not "Groups"
if df_column.name != "Groups":
return df_column.apply(binarize_func)
else:
return df_column
df.apply(data_transform).groupby('Groups')['Multiple'].mean().sort_index()
CodePudding user response:
your data and the result don't match. However, just focusing on making these three lines into a single line. this is one way to accomplish it
df.assign(mult=np.where(df['TotalOwned'] > 1, 1, 0)
).groupby('Groups')['mult'].mean()
Result, based on the provided data and the code to be combined into a single line
Groups
0 0.5
1 1.0
2 0.5
Name: mult, dtype: float64
CodePudding user response:
IIUC you can do in one-line:
df_out = df.assign(TotalOwned=(df['TotalOwned'] > 1)).groupby('Groups').mean()