I have a dataframe that looks like this:
id name industry income
1 apple telecommunication 100
2 oil gas 100
3 samsung telecommunication 200
4 coinbase crypto 100
5 microsoft telecommunication 30
so what I want to do is find the average income of each industry. it would be: telecommunication 110, gas 100, crypto 100.
what ive done is find the frequency of each industry:
df.groupby(['industry']).sum().value_counts('industry')
which results in:
industry
telecommunication 3
gas 1
crypto 1
and also I've found the sum of income of each industry:
df.groupby(['industry']).sum()['income']
which results in
industry
telecommunication 330
gas 100
crypto 100
Now I'm kind of stuck on how to continue...
CodePudding user response:
You're looking for mean
:
means = df.groupby('industry')['income'].mean()
Output:
>>> means
industry
crypto 100.0
gas 100.0
telecommunication 110.0
Name: income, dtype: float64
>>> means['telecommunication']
110.0
CodePudding user response:
If you wanted to keep all other details, groupby and transform
df['mean']=df.groupby('industry')['income'].transform('mean')
id name industry income mean
0 1 apple telecommunication 100 110.0
1 2 oil gas 100 100.0
2 3 samsung telecommunication 200 110.0
3 4 coinbase crypto 100 100.0
4 5 microsoft telecommunication 30 110.0
If yo need a summarised frame
df.groupby('industry')['income'].mean().to_frame('mean_income')
mean_income
industry
crypto 100.0
gas 100.0
telecommunication 110.0
CodePudding user response:
Maybe you should use agg
to avoid multiple operations:
out = df.groupby('industry', sort=False).agg(size=('income', 'size'),
mean=('income', 'mean'),
sum=('income', 'sum')).reset_index()
print(out)
# Output:
industry size mean sum
0 telecommunication 3 110.0 330
1 gas 1 100.0 100
2 crypto 1 100.0 100