I am working on a chat analyzer, and trying to make a pie chart for most active users. Suppose I have the following data.
Name | Number of messages |
---|---|
A | 234 |
B | 128 |
C | 112 |
D | 97 |
E | 86 |
F | 43 |
G | 32 |
H | 24 |
I | 22 |
J | 9 |
I want to convert it into the following
Name | Number of messages |
---|---|
A | 234 |
B | 128 |
C | 112 |
D | 97 |
E | 86 |
Other | 130 |
I am using pandas.value_counts() method for getting all the data for the first table, but am not able to use pandas.group_by() to get the second table.
Please can anyone help me out?
CodePudding user response:
Use df.where
to convert names outside of a defined list to Other
:
# If name is A, B, C, D or E, keep it. Otherwise, change it to Other
name = df["Name"].where(df["Name"].isin(["A", "B", "C", "D" ,"E"]), "Other")
df.groupby(name)["Number of messages"].sum()
CodePudding user response:
Try this:
# Find rows that have Name==["A", "B", "C", "D" ,"E"]
m = df["Name"].isin(["A", "B", "C", "D" ,"E"])
# Get rows that have Not Name == ["A", "B", "C", "D" ,"E"]
df_tmp = df[~m]
# Drop rows that have indexes like df_tmp from original 'df'
# Append row with Name=='Other' and sum 'Number of messages' from 'df_tmp'
df = df.drop(df_tmp.index
).append({'Name' : 'Other',
'Number of messages': df_tmp['Number of messages'].sum()
}, ignore_index=True)
print(df)
Output:
Name Number of messages
0 A 234
1 B 128
2 C 112
3 D 97
4 E 86
5 Other 130
CodePudding user response:
Let us try rank
with clip
s = df['Number of messages'].rank().clip(5)
out = df.groupby(s).agg({'Name':'first','Number of messages':'sum'}).sort_index(ascending=False).replace({'F':'other'})
Out[295]:
Name Number of messages
Number of messages
10.0 A 234
9.0 B 128
8.0 C 112
7.0 D 97
6.0 E 86
5.0 other 130