Home > OS >  compare non-numeric values in two rows in a column pandas
compare non-numeric values in two rows in a column pandas

Time:07-14

I asked this question and the answer provided worked for me very well.

compare value in two rows in a column pandas

However, now I have a similar task of comparing values from two rows of the same column but this time, column is non-numeric. So in a nutshell, my df is something like this

                color          char               days                text
       1         red          light                 7                 good
       2         red          light                30                 good
       3         red         medium                60                  bad
       4         blue          dark                 7                  bad
       5         blue         light                30                 good
       6         blue         light                60                  bad
       7        yellow        light                 7                 good
       8        yellow        medium               30                  bad
       9        yellow        dark                 60                  bad

So basically, for each color, I have char values for 7 days, 30 days and 60 days. Please note that these are not always in correct order as I gave in example above. My task now is to compare value in char for each color between the consecutive days values and if the value is different than before, then write in column "text" as "NA". Text in days 7 category is default and cannot be overwritten.

Desired result:

                color          char               days                text
       1         red          light                 7                 good
       2         red          light                30                 good
       3         red         medium                60                  NA
       4         blue          dark                 7                  bad
       5         blue         light                30                  NA
       6         blue         light                60                  bad
       7        yellow        light                 7                 good
       8        yellow        medium               30                  NA
       9        yellow        dark                 60                  NA

I tried the solution from the other question with several variations, but it doesnt work for me. Can someone please help me with this? I dont necessarily need only a variation of the given solution, but any efficient way is appreciated.

CodePudding user response:

You can try groupby color then mask text column in each group

df['text'] = (df.sort_values(['color', 'days'])
              .groupby('color', as_index=False, group_keys=False)
              .apply(lambda g: g['text'].mask(g['char'].ne(g['char'].shift().bfill()), np.nan)))
print(df)

    color    char  days  text
1     red   light     7  good
2     red   light    30  good
3     red  medium    60   NaN
4    blue    dark     7   bad
5    blue   light    30   NaN
6    blue   light    60   bad
7  yellow   light     7  good
8  yellow  medium    30   NaN
9  yellow    dark    60   NaN
  • Related