I asked this question and the answer provided worked for me very well.
compare value in two rows in a column pandas
However, now I have a similar task of comparing values from two rows of the same column but this time, column is non-numeric. So in a nutshell, my df is something like this
color char days text
1 red light 7 good
2 red light 30 good
3 red medium 60 bad
4 blue dark 7 bad
5 blue light 30 good
6 blue light 60 bad
7 yellow light 7 good
8 yellow medium 30 bad
9 yellow dark 60 bad
So basically, for each color, I have char values for 7 days, 30 days and 60 days. Please note that these are not always in correct order as I gave in example above. My task now is to compare value in char for each color between the consecutive days values and if the value is different than before, then write in column "text" as "NA". Text in days 7 category is default and cannot be overwritten.
Desired result:
color char days text
1 red light 7 good
2 red light 30 good
3 red medium 60 NA
4 blue dark 7 bad
5 blue light 30 NA
6 blue light 60 bad
7 yellow light 7 good
8 yellow medium 30 NA
9 yellow dark 60 NA
I tried the solution from the other question with several variations, but it doesnt work for me. Can someone please help me with this? I dont necessarily need only a variation of the given solution, but any efficient way is appreciated.
CodePudding user response:
You can try groupby color
then mask text
column in each group
df['text'] = (df.sort_values(['color', 'days'])
.groupby('color', as_index=False, group_keys=False)
.apply(lambda g: g['text'].mask(g['char'].ne(g['char'].shift().bfill()), np.nan)))
print(df)
color char days text
1 red light 7 good
2 red light 30 good
3 red medium 60 NaN
4 blue dark 7 bad
5 blue light 30 NaN
6 blue light 60 bad
7 yellow light 7 good
8 yellow medium 30 NaN
9 yellow dark 60 NaN