Home > Blockchain >  Find average of each
Find average of each

Time:12-22

I have a dataframe that looks like this:

id    name         industry               income
1     apple       telecommunication         100     
2     oil           gas                     100
3    samsung      telecommunication         200
4   coinbase       crypto                   100
5   microsoft    telecommunication          30

so what I want to do is find the average income of each industry. it would be: telecommunication 110, gas 100, crypto 100.

what ive done is find the frequency of each industry:

df.groupby(['industry']).sum().value_counts('industry')

which results in:

industry
telecommunication       3
gas                     1
crypto                  1

and also I've found the sum of income of each industry:

df.groupby(['industry']).sum()['income']

which results in

industry
telecommunication       330
gas                     100
crypto                  100

Now I'm kind of stuck on how to continue...

CodePudding user response:

You're looking for mean:

means = df.groupby('industry')['income'].mean()

Output:

>>> means
industry
crypto               100.0
gas                  100.0
telecommunication    110.0
Name: income, dtype: float64

>>> means['telecommunication']
110.0

CodePudding user response:

If you wanted to keep all other details, groupby and transform

df['mean']=df.groupby('industry')['income'].transform('mean')



  id       name           industry  income   mean
0   1      apple  telecommunication     100  110.0
1   2        oil                gas     100  100.0
2   3    samsung  telecommunication     200  110.0
3   4   coinbase             crypto     100  100.0
4   5  microsoft  telecommunication      30  110.0

If yo need a summarised frame

df.groupby('industry')['income'].mean().to_frame('mean_income')

   

                     mean_income
industry                      
crypto                   100.0
gas                      100.0
telecommunication        110.0

CodePudding user response:

Maybe you should use agg to avoid multiple operations:

out = df.groupby('industry', sort=False).agg(size=('income', 'size'), 
                                             mean=('income', 'mean'), 
                                             sum=('income', 'sum')).reset_index()
print(out)

# Output:
            industry  size   mean  sum
0  telecommunication     3  110.0  330
1                gas     1  100.0  100
2             crypto     1  100.0  100
  • Related