Group dataframe by two columns and then find average count based on one of the groups-CodePudding

Really struggling to get this solution. Suppose I have the dataframe below:

SEX	ITEM	Some other column
M	Socks	233
M	Socks	1
M	Hat	2
F	Socks	3
F	Hat	3
F	Hat	6
F	Hat	2

I would like to find the average number of occurrences of each ITEM based on the SEX group

SEX	ITEM	Average
M	Socks	0.6666
M	Hat	0.3333
F	Socks	0.25
F	Hat	0.75

Can anyone help me with this?

CodePudding user response：

Let us try crosstab

out = pd.crosstab(df['SEX'], df['ITEM'],normalize='index').stack().reset_index(name='average')
Out[36]: 
  SEX   ITEM   average
0   F    Hat  0.750000
1   F  Socks  0.250000
2   M    Hat  0.333333
3   M  Socks  0.666667

CodePudding user response：

You could try the function groupby() with the count() aggregation, and then divide the relevant columns to get the ratio

df2 = df.groupby(['SEX']).count().reset_index()
df1 = df.groupby(['SEX', 'ITEM']).count().reset_index()
df1['SEX_count'] = df1['SEX'].apply(lambda x: df2['ITEM'][df2['SEX']==x].values[0])
df1['Average'] = df1['Someothercolumn'] / df1['SEX_count']
df1 = df1.drop(columns=['Someothercolumn', 'SEX_count'])
print(df1)

Output:

  SEX   ITEM   Average
0   F    Hat  0.750000
1   F  Socks  0.250000
2   M    Hat  0.333333
3   M  Socks  0.666667