Home > Net >  Closest non equal row in a column in Pandas dataframe
Closest non equal row in a column in Pandas dataframe

Time:09-13

I have this df

d={}
d['id']=['1','1','1','1','1','1','1','1','2','2','2','2','2','2','2','2']
d['qty']=[5,5,5,5,5,6,5,5,1,1,2,2,2,3,5,8]

I would like to create a column that is going to have the following non-equal value of column qty. Meaning that if qty is equal to 5 and its next row is 5 I am going to skip it and look until I find next value not equal to 5, In my case it is 6. And all this should be grouped by id

Here is the desired dataframe.

d['id']=['1','1','1','1','1','1','1','1','2','2','2','2','2','2','2','2']
d['qty']=[5,5,5,5,5,6,5,5,1,1,2,2,2,3,5,8]
d['qty2']=[6,6,6,6,6,5,'NAN','NAN',2,2,3,3,3,5,8,'NAN']

Any help is very much appreciated

CodePudding user response:

You can groupby.shift, mask the identical values, and groupby.bfill:

# shift up per group
s = df.groupby('id')['qty'].shift(-1)

# keep only the different values and bfill per group
df['qty2'] = s.where(df['qty'].ne(s)).groupby(df['id']).bfill()

output:

   id  qty  qty2
0   1    5   6.0
1   1    5   6.0
2   1    5   6.0
3   1    5   6.0
4   1    5   6.0
5   1    6   5.0
6   1    5   NaN
7   1    5   NaN
8   2    1   2.0
9   2    1   2.0
10  2    2   3.0
11  2    2   3.0
12  2    2   3.0
13  2    3   5.0
14  2    5   8.0
15  2    8   NaN
  • Related