Home > database >  How to write first n record and last n record of a matching type (based on row)
How to write first n record and last n record of a matching type (based on row)

Time:10-13

I have the following dataframe

DATETIME,TYPE
2021-10-13 18:04:52,NaN
2021-10-13 18:04:53,NaN
2021-10-13 18:04:54,NaN
2021-10-13 18:04:55,NaN
2021-10-13 18:04:56,NaN
2021-10-13 18:04:57,NaN
2021-10-13 18:04:58,Defect
2021-10-13 18:04:59,NaN
2021-10-13 18:05:00,NaN
2021-10-13 18:05:01,NaN
2021-10-13 18:05:02,NaN
2021-10-13 18:05:03,NaN
2021-10-13 18:05:04,NaN
2021-10-13 18:05:05,NaN
2021-10-13 18:05:06,NaN
2021-10-13 18:05:07,NaN
2021-10-13 18:05:08,NaN
2021-10-13 18:05:09,NaN
2021-10-13 18:05:10,Defect
2021-10-13 18:05:11,NaN
2021-10-13 18:05:12,NaN
2021-10-13 18:05:13,NaN
2021-10-13 18:05:14,NaN
2021-10-13 18:05:15,NaN
2021-10-13 18:05:16,NaN
2021-10-13 18:05:17,NaN
2021-10-13 18:05:18,NaN
2021-10-13 18:05:19,NaN
2021-10-13 18:05:20,NaN
2021-10-13 18:05:21,NaN

And you can see on 18:04:58 and 18:05:10, there is Defect, how can I write to the previous 18:04:57 and the next 18:04:59 as Defect as-well? The same goes for 18:05:09 and 18:05:11.

I've tried using creating a definition and using apply, however, apply is not possible because it is passed as a string instead of an array.

Desired output:

2021-10-13 18:04:52,NaN
2021-10-13 18:04:53,NaN
2021-10-13 18:04:54,NaN
2021-10-13 18:04:55,NaN
2021-10-13 18:04:56,NaN
2021-10-13 18:04:57,Defect
2021-10-13 18:04:58,Defect
2021-10-13 18:04:59,Defect
2021-10-13 18:05:00,NaN
2021-10-13 18:05:01,NaN
2021-10-13 18:05:02,NaN
2021-10-13 18:05:03,NaN
2021-10-13 18:05:04,NaN
2021-10-13 18:05:05,NaN
2021-10-13 18:05:06,NaN
2021-10-13 18:05:07,NaN
2021-10-13 18:05:08,NaN
2021-10-13 18:05:09,Defect
2021-10-13 18:05:10,Defect
2021-10-13 18:05:11,Defect
2021-10-13 18:05:12,NaN
2021-10-13 18:05:13,NaN
2021-10-13 18:05:14,NaN
2021-10-13 18:05:15,NaN
2021-10-13 18:05:16,NaN
2021-10-13 18:05:17,NaN
2021-10-13 18:05:18,NaN
2021-10-13 18:05:19,NaN
2021-10-13 18:05:20,NaN
2021-10-13 18:05:21,NaN

CodePudding user response:

Try shift and loc assignment:

df.loc[df['TYPE'].shift().eq('Defect') | df['TYPE'].shift(-1).eq('Defect'), 'TYPE'] = 'Defect'

CodePudding user response:

You can use combine_first twice:

df['TYPE'] = df['TYPE'].combine_first(df['TYPE'].shift()) \
                       .combine_first(df['TYPE'].shift(-1))
print(df)

# Output:

               DATETIME    TYPE
0   2021-10-13 18:04:52     NaN
1   2021-10-13 18:04:53     NaN
2   2021-10-13 18:04:54     NaN
3   2021-10-13 18:04:55     NaN
4   2021-10-13 18:04:56     NaN
5   2021-10-13 18:04:57  Defect
6   2021-10-13 18:04:58  Defect
7   2021-10-13 18:04:59  Defect
8   2021-10-13 18:05:00     NaN
9   2021-10-13 18:05:01     NaN
10  2021-10-13 18:05:02     NaN
11  2021-10-13 18:05:03     NaN
12  2021-10-13 18:05:04     NaN
13  2021-10-13 18:05:05     NaN
14  2021-10-13 18:05:06     NaN
15  2021-10-13 18:05:07     NaN
16  2021-10-13 18:05:08     NaN
17  2021-10-13 18:05:09  Defect
18  2021-10-13 18:05:10  Defect
19  2021-10-13 18:05:11  Defect
20  2021-10-13 18:05:12     NaN
21  2021-10-13 18:05:13     NaN
22  2021-10-13 18:05:14     NaN
23  2021-10-13 18:05:15     NaN
24  2021-10-13 18:05:16     NaN
25  2021-10-13 18:05:17     NaN
26  2021-10-13 18:05:18     NaN
27  2021-10-13 18:05:19     NaN
28  2021-10-13 18:05:20     NaN
29  2021-10-13 18:05:21     NaN

  • Related