Home > database >  Setting title after aggregation
Setting title after aggregation

Time:02-19

I am trying to aggregate and count values together. Below you can see my dataset

data = {'id':['1','2','3','4','5'],
        'name': ['Company1', 'Company1', 'Company3', 'Company3', 'Company5'], 
        'sales': [10, 3, 5, 1, 0], 
        'income': [10, 3, 5, 1, 0], 
       }
df = pd.DataFrame(data, columns = ['id','name', 'sales','income'])

conditions = [
    (df['sales'] < 1),
    (df['sales'] >= 1) & (df['sales'] < 3),
    (df['sales'] >= 3) & (df['sales'] < 5),
    (df['sales'] >= 5)
    ]


values = ['<1', '1-3', '3-5', '>= 5']

df['range'] = np.select(conditions, values)

df=df.groupby('range')['sales','income'].agg(['count','sum']).reset_index()   

This code gives me the next table

enter image description here

But I am not satisfied with the appearance of this table because 'count' is duplicated two times. So can anybody help me with this table in order to have separate columns 'range', 'count', 'income' and 'sales'.

CodePudding user response:

You could try named aggregation:

df.groupby('range', as_index=False).agg(count=('range','count'), sales=('sales','sum'), income=('income','sum'))

Output:

  range  count  sales  income
0   1-3      1      1       1
1   3-5      1      3       3
2    <1      1      0       0
3  >= 5      2     15      15

P.S. You probably want to make "range" a categorical variable, so that the output is sorted in the correct order:

df['range'] = pd.Categorical(np.select(conditions, values), categories=values, ordered=True)

Then the above code outputs:

  range  count  sales  income
0    <1      1      0       0
1   1-3      1      1       1
2   3-5      1      3       3
3  >= 5      2     15      15
  • Related