Home > other >  Summarise smaller part as remainder
Summarise smaller part as remainder

Time:08-18

I have a problem. I would like to show how many savings the customers have made. I don't want to show all customers, but rather summarise the small part as " remaining". The problem is that I want to group by customer and then calculate the total per customer and then form the diagram from that.

The problem is that the percentages and not the complete sum are displayed.

Edit (The following error was fixed.) Furthermore the error : AttributeError: 'DataFrameGroupBy' object has no attribute 'reset_index'.

Dataframe

    customerId     costs
0            1  1.054722
1            1  3.287335
2            1  1.920475
3            2  0.502692
4            2  4.900304
5            2  2.676288
6            3  0.455319
7            3  3.261040
8            3  2.049914
9            4  2.293546
10           4  1.353868
11           4  0.018763
12           4  4.371444
13           4  3.082480
14           4  3.056038

Code

import pandas as pd
import numpy as np

d = {
    "customerId": [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4],
    "costs": 5 * np.random.random_sample((15,)),
}
df = pd.DataFrame(data=d)
print(df)

threshold = 0.1
df_calc_typ = df[['costs', 'customerId']].groupby(['customerId']).sum()
df_calc_typ = df_calc_typ.reset_index()
df_calc_typ.columns = ['type', 'count']
df_calc_typ['percentage'] = df_calc_typ['count']/df_calc_typ['count'].sum()

remaining = df_calc_typ.loc[df_calc_typ['percentage'] < threshold].sum(axis = 0)
remaining.loc['type'] = 'remaining'
df_calc_typ = df_calc_typ[df_calc_typ['percentage'] >= threshold]

df_calc_typ = df_calc_typ.append(remaining, ignore_index = True)
df_calc_typ['count'] = df_calc_typ['count'].astype(int)

colors = sns.color_palette('GnBu_r')
#explode = [0.0,0.01,0.01,0.4]
plt.pie(df_calc_typ['count'],
        labels = df_calc_typ['type'], colors = colors, autopct = '%0.0f%%')
plt.show()

What I want e.g.

                costs
customerId           
1            6.262532
2            8.079285
remaining    5.766273
4           14.176138

enter image description here

CodePudding user response:

You can try:

sums = df.groupby('customerId').costs.sum()
plt.pie(sums, labels = ['remaining' if i == sums.idxmin() else i for i in sums.index ])

CodePudding user response:

You can make a double groupby / sum: first on customerId, then on the grouped costs per customer where they are less than or equal to a threshold sum.

You can use a enter image description here

CodePudding user response:

You can supply a callable to autopct, instead of the string formatting you have here. For example, in your code you could add this function:

def values(val):
    return np.round(val)

The function doesn't need to do much except return the values supplied to it. Here the values have just been rounded down.

Then, in your code where you plot the pie chart, make autopct reference your function:

plt.pie(df_calc_typ['count'], labels = df_calc_typ['type'], colors = colors, autopct = values)

enter image description here

  • Related