I have a dataframe which as a column for grouping by and several other columns. Play dataframe:
d = {'group_col': ["a","b","b","a"],'col1': [1, 2, 3, 4], 'col2': [3, 4, 5, 6]}
df = pd.DataFrame(data=d)
When using a groupby on this dataframe followed by a default function, the groupby column is set as an index and not included in the results:
# using sum as an example
df.groupby('group_col').sum()
But when I define a custom function and use apply
, I get an unwanted additional column:
# Sum function for use by apply
def sum_2(x):
return x.sum()
df.groupby('group_col').apply(sum_2)
How do I avoid having this additional column?
The actual function I want to use is the following:
def tss(x):
return ((x - x.mean(numeric_only = True))**2).sum()
df.groupby('group_col').apply(tss)
CodePudding user response:
You can try to use .agg
instead of .apply
:
def tss(x):
return ((x - x.mean()) ** 2).sum()
print(df.groupby("group_col").agg(tss))
Prints:
col1 col2
group_col
a 4.5 4.5
b 0.5 0.5
CodePudding user response:
IIUC, use as_index= False to avoid teh addl column
def sum_2(x):
return x.sum()
df.groupby('group_col', as_index=False).apply(sum_2)
group_col col1 col2
0 aa 5 9
1 bb 5 9