Home > Mobile >  Defining an aggregation function with groupby in pandas
Defining an aggregation function with groupby in pandas

Time:01-12

I would like to collapse my dataset using groupby and agg, however after collapsing, I want the new column to show a string value only for the grouped rows. For example, the initial data is:

df = pd.DataFrame([["a",1],["a",2],["b",2]], columns=['category','value'])

    category    value
0      a         1
1      a         3
2      b         2

Desired output:

   category   value
0     a      grouped
1     b         2

How should I modify my code (to show "grouped" instead of 3):

df=df.groupby(['category'], as_index=False).agg({'value':'max'})

CodePudding user response:

You can use a lambda with a ternary:

df.groupby("category", as_index=False)
    .agg({"value": lambda x: "grouped" if len(x) > 1 else x})

This outputs:

  category    value
0        a  grouped
1        b        2

CodePudding user response:

Another possible solution:

(df.assign(value = np.where(
    df.duplicated(subset=['category'], keep=False), 'grouped', df['value']))
 .drop_duplicates())

Output:

  category    value
0        a  grouped
2        b        2
  • Related