Home > Mobile >  Change the value of a pandas dataframe column based on a condition ,also depending on other columns
Change the value of a pandas dataframe column based on a condition ,also depending on other columns

Time:07-09

    Category              DishName   Id 
0   a                     Pistachio  621f4884e48bc60012364b13   
1   a                     Pistachio  621f4884e48bc60012364b13   
2   a                     Pistachio  621f4884e48bc60012364b13   
3   a                     achar      621f4884e48bc60012364b13   
4   b                     achar      621f4884e48bc60012364b13   
5   b                     achar      621f4884e48bc60012364b13   
6   a                     chicken    621f4884e48bc60012364b13   
7   b                     chicken    621f4884e48bc60012364b13   
8   c                     chicken    621f4884e48bc60012364b13 

My dataframe has 3 columns category, dishname and id. Considering the id and the dishname I have to assign category.

Assign "a" if all the category values are "a"

Assign "b" if category values are "a","b"

Assign "c" if category values are "a","b","c"

Expected output is

    Category              DishName   Id 
0   a                     Pistachio  621f4884e48bc60012364b13   
1   a                     Pistachio  621f4884e48bc60012364b13   
2   a                     Pistachio  621f4884e48bc60012364b13   
3   b                     achar      621f4884e48bc60012364b13   
4   b                     achar      621f4884e48bc60012364b13   
5   b                     achar      621f4884e48bc60012364b13   
6   c                     chicken    621f4884e48bc60012364b13   
7   c                     chicken    621f4884e48bc60012364b13   
8   c                     chicken    621f4884e48bc60012364b13 

CodePudding user response:

You can transform to ordered Categorical and get the max per group:

df['Category'] = (pd
                  .Series(pd.Categorical(df['Category'],
                                         categories=['a', 'b', 'c'], ordered=True),
                          index=df.index)
                  .groupby(df['DishName'])
                  .transform('max')
                  )

NB. You wouldn't need the categorical for simply a, b, c, as those three are lexicographically sorted, but I imagine a real life case wouldn't necessarily be. As example low < medium < high is logically but not lexicographically sorted.

Output:

  Category   DishName                        Id
0        a  Pistachio  621f4884e48bc60012364b13
1        a  Pistachio  621f4884e48bc60012364b13
2        a  Pistachio  621f4884e48bc60012364b13
3        b      achar  621f4884e48bc60012364b13
4        b      achar  621f4884e48bc60012364b13
5        b      achar  621f4884e48bc60012364b13
6        c    chicken  621f4884e48bc60012364b13
7        c    chicken  621f4884e48bc60012364b13
8        c    chicken  621f4884e48bc60012364b13

CodePudding user response:

df['Category'] = df.groupby('DishName')['Category'].transform('max')

Output:

  Category   DishName                        Id
0        a  Pistachio  621f4884e48bc60012364b13
1        a  Pistachio  621f4884e48bc60012364b13
2        a  Pistachio  621f4884e48bc60012364b13
3        b      achar  621f4884e48bc60012364b13
4        b      achar  621f4884e48bc60012364b13
5        b      achar  621f4884e48bc60012364b13
6        c    chicken  621f4884e48bc60012364b13
7        c    chicken  621f4884e48bc60012364b13
8        c    chicken  621f4884e48bc60012364b13
  • Related