Really struggling to get this solution. Suppose I have the dataframe below:
SEX | ITEM | Some other column |
---|---|---|
M | Socks | 233 |
M | Socks | 1 |
M | Hat | 2 |
F | Socks | 3 |
F | Hat | 3 |
F | Hat | 6 |
F | Hat | 2 |
I would like to find the average number of occurrences of each ITEM based on the SEX group
SEX | ITEM | Average |
---|---|---|
M | Socks | 0.6666 |
M | Hat | 0.3333 |
F | Socks | 0.25 |
F | Hat | 0.75 |
Can anyone help me with this?
CodePudding user response:
Let us try crosstab
out = pd.crosstab(df['SEX'], df['ITEM'],normalize='index').stack().reset_index(name='average')
Out[36]:
SEX ITEM average
0 F Hat 0.750000
1 F Socks 0.250000
2 M Hat 0.333333
3 M Socks 0.666667
CodePudding user response:
You could try the function groupby()
with the count()
aggregation, and then divide the relevant columns to get the ratio
df2 = df.groupby(['SEX']).count().reset_index()
df1 = df.groupby(['SEX', 'ITEM']).count().reset_index()
df1['SEX_count'] = df1['SEX'].apply(lambda x: df2['ITEM'][df2['SEX']==x].values[0])
df1['Average'] = df1['Someothercolumn'] / df1['SEX_count']
df1 = df1.drop(columns=['Someothercolumn', 'SEX_count'])
print(df1)
Output:
SEX ITEM Average
0 F Hat 0.750000
1 F Socks 0.250000
2 M Hat 0.333333
3 M Socks 0.666667