Suppose I have a data frame
import pandas as pd
df = pd.DataFrame({'group':['A','A','B','B','C','C'],'score':[1,2,3,4,5,6]})
At first, say, I want to compute the groups' sums of scores. I usually do
def group_func(x):
d = {}
d['sum_scores'] = x['score'].sum()
return pd.Series(d)
df.groupby('group').apply(group_func).reset_index()
Now suppose I want to modify group_func
but this modification requires that I know the group identity of the current input x
. I tried x['group']
and x[group].iloc[0]
within the function's definition and neither worked.
Is there a way for the function
group_func(x)
to know the defining coordinates of the current inputx
?
In this toy example, say, I just want to get:
pd.DataFrame({'group':['A','B','C'],'sum_scores':[3,7,11],'name_of_group':['A','B','C']})
where obviously the last column just repeats the first one. I'd like to know how to make this last column using a function like group_func(x)
. Like: as group_func
processes the x
that corresponds to group 'A' and generates the value 3 for sum_scores
, how do I extract the current identity 'A' within the local scope of group_func
?
CodePudding user response:
Just add .name
def group_func(x):
d = {}
d['sum_scores'] = x['score'].sum()
d['group_name'] = x.name # d['group_name'] = x['group'].iloc[0]
return pd.Series(d)
df.groupby('group').apply(group_func)
Out[63]:
sum_scores group_name
group
A 3 A
B 7 B
C 11 C
Your code fix see about marked line adding ''