I would like to group C and D in my dataframe
Category count
0 A 327
1 B 20
2 C 30
3 D 302
to
Category count
0 A 327
1 B 20
2 NOT A and B 332
I can replace the value of C and D and then group but is there a better way to do so?
CodePudding user response:
You can use concat
after boolean masking.
m = (df['Category'].ne('A')) & (df['Category'].ne('B'))
df = pd.concat([
df[~m],
pd.DataFrame({
'Category': ['NOT A and B'],
'count': [df[m]['count'].sum()]
})
], ignore_index=True)
print(df):
Category count
0 A 327
1 B 20
2 NOT A and B 332
CodePudding user response:
Other options are:
import pandas as pd
d = pd.DataFrame({'Category': list('ABDC'), 'count': [327, 20, 30, 302]})
d['Category'] = d.Category.map({x:x for x in ['A', 'B']}).fillna('NOT A or B')
d.groupby('Category').agg({'count': sum})
Or:
d['Category'] = np.where(d.Category.isin(['A', 'B']), d.Category, 'NOT A or B')
d.groupby('Category').agg(sum)