I would like to collapse my dataset using groupby
and agg
, however after collapsing, I want the new column to show a string value only for the grouped rows.
For example, the initial data is:
df = pd.DataFrame([["a",1],["a",2],["b",2]], columns=['category','value'])
category value
0 a 1
1 a 3
2 b 2
Desired output:
category value
0 a grouped
1 b 2
How should I modify my code (to show "grouped" instead of 3):
df=df.groupby(['category'], as_index=False).agg({'value':'max'})
CodePudding user response:
You can use a lambda with a ternary:
df.groupby("category", as_index=False)
.agg({"value": lambda x: "grouped" if len(x) > 1 else x})
This outputs:
category value
0 a grouped
1 b 2
CodePudding user response:
Another possible solution:
(df.assign(value = np.where(
df.duplicated(subset=['category'], keep=False), 'grouped', df['value']))
.drop_duplicates())
Output:
category value
0 a grouped
2 b 2