I have a problem. I want to plot a pie chart. But unfortunately only three id
s are readable. The other have a just to small part. Is there an option to summary, all the small ones and sum up then with the name remaining
? Is there also an option to that automatically? Because I can say the limit is 100, 1000 etc. but is there an option to sum up automatically. I am using in my real dataframe df.value_counts()
Dataframe
id count
0 1 4521
1 2 1247
2 3 962
3 4 12
4 5 6
5 6 5
6 7 4
Code
import pandas as pd
import seaborn as sns
d = {'id': [1, 2, 3, 4, 5, 6, 7],
'count': [4521, 1247, 962, 12, 6, 5, 4],
}
df = pd.DataFrame(data=d)
print(df)
colors = sns.color_palette('GnBu_r')
plt.pie(df['count'],
labels = df['id'], colors = colors)
plt.show()
CodePudding user response:
You can combine rows in your data with a condition: if the 'percentage'
is less than a threshold, then sum up those rows:
threshold = 0.1
df['percentage'] = df['count']/df['count'].sum()
remaining = df.loc[df['percentage'] < threshold].sum(axis = 0)
remaining.loc['id'] = 'remaining'
df = df[df['percentage'] >= threshold]
df = df.append(remaining, ignore_index = True)
df['count'] = df['count'].astype(int)
So you get:
id count percentage
0 1 4521 0.669084
1 2 1247 0.184549
2 3 962 0.142371
3 remaining 27 0.003996
Complete code
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
threshold = 0.1
d = {'id': [1, 2, 3, 4, 5, 6, 7],
'count': [4521, 1247, 962, 12, 6, 5, 4]}
df = pd.DataFrame(data = d)
df['percentage'] = df['count']/df['count'].sum()
remaining = df.loc[df['percentage'] < threshold].sum(axis = 0)
remaining.loc['id'] = 'remaining'
df = df[df['percentage'] >= threshold]
df = df.append(remaining, ignore_index = True)
df['count'] = df['count'].astype(int)
colors = sns.color_palette('GnBu_r')
plt.pie(df['count'],
labels = df['id'], colors = colors)
plt.show()