Home > Net >  Sum up 'remaining' that pie chart is more readable
Sum up 'remaining' that pie chart is more readable

Time:05-09

I have a problem. I want to plot a pie chart. But unfortunately only three ids are readable. The other have a just to small part. Is there an option to summary, all the small ones and sum up then with the name remaining? Is there also an option to that automatically? Because I can say the limit is 100, 1000 etc. but is there an option to sum up automatically. I am using in my real dataframe df.value_counts()

Dataframe

   id  count
0   1   4521
1   2   1247
2   3    962
3   4     12
4   5      6
5   6      5
6   7      4

Code

import pandas as pd
import seaborn as sns
d = {'id': [1, 2, 3, 4, 5, 6, 7],
     'count': [4521, 1247, 962, 12, 6, 5, 4],
    }
df = pd.DataFrame(data=d)
print(df)

colors = sns.color_palette('GnBu_r')
plt.pie(df['count'], 
        labels = df['id'], colors = colors)
plt.show()

enter image description here

CodePudding user response:

You can combine rows in your data with a condition: if the 'percentage' is less than a threshold, then sum up those rows:

threshold = 0.1
df['percentage'] = df['count']/df['count'].sum()

remaining = df.loc[df['percentage'] < threshold].sum(axis = 0)
remaining.loc['id'] = 'remaining'
df = df[df['percentage'] >= threshold]

df = df.append(remaining, ignore_index = True)
df['count'] = df['count'].astype(int)

So you get:

          id  count  percentage
0          1   4521    0.669084
1          2   1247    0.184549
2          3    962    0.142371
3  remaining     27    0.003996

Complete code

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

threshold = 0.1

d = {'id': [1, 2, 3, 4, 5, 6, 7],
     'count': [4521, 1247, 962, 12, 6, 5, 4]}
df = pd.DataFrame(data = d)
df['percentage'] = df['count']/df['count'].sum()

remaining = df.loc[df['percentage'] < threshold].sum(axis = 0)
remaining.loc['id'] = 'remaining'
df = df[df['percentage'] >= threshold]

df = df.append(remaining, ignore_index = True)
df['count'] = df['count'].astype(int)

colors = sns.color_palette('GnBu_r')
plt.pie(df['count'],
        labels = df['id'], colors = colors)
plt.show()

Plot

enter image description here

  • Related