Home > database >  How to mark 2 rows before and after a value change in a specific column?
How to mark 2 rows before and after a value change in a specific column?

Time:06-24

I want to create a new column 'mark' in the dataframe. I already set the value of 'mark' on True when a change of value occurs in column 'Numbers'. I would like to set the values on 'True' two row before and after a value change in column 'Numbers'.

I would like to see:

Numbers mark
10 False
10 False
10 False
10 False
10 True
10 True
20 True
20 True
20 False
20 False
20 True
20 True
30 True
40 True
40 True
40 False
40 False
40 False

What in found so far:

import pandas as pd
data = [10,10,10,10,10,10,20,20,20,20,20,20,20,30,40,40,40,40,40] 
df = pd.DataFrame(data, columns=['Numbers'])
check=10
def detect_changes (row):
    global check 
    if row['Numbers'] == check :           
        return False
    else :
        check=check 10
        return True
df['mark']=df.apply (lambda row: detect_changes(row), axis=1)

CodePudding user response:

You can use a rolling max:

# better than the used function
df['mark'] = df['Numbers'].diff().eq(10)

N = 2 # number of rows before/after 
df['mark2'] = df['mark'].rolling(2*N 1, min_periods=1, center=True).max().astype(bool)

output:

    Numbers   mark  mark2
0        10  False  False
1        10  False  False
2        10  False  False
3        10  False  False
4        10  False   True
5        10  False   True
6        20   True   True
7        20  False   True
8        20  False   True
9        20  False  False
10       20  False  False
11       20  False   True
12       20  False   True
13       30   True   True
14       40   True   True
15       40  False   True
16       40  False   True
17       40  False  False
18       40  False  False

1 before, 2 after

# better than the used function
df['mark'] = df['Numbers'].diff().eq(10)

# 1 before
m1 = df.loc[::-1, 'mark'].rolling(1 1, min_periods=1).max().astype(bool)

# 2 after
m2 = df['mark'].rolling(2 1, min_periods=1).max().astype(bool)

df['mark2'] = m1|m2

output:

    Numbers   mark  mark2
0        10  False  False
1        10  False  False
2        10  False  False
3        10  False  False
4        10  False  False
5        10  False   True
6        20   True   True
7        20  False   True
8        20  False   True
9        20  False  False
10       20  False  False
11       20  False   True
12       30   True   True
13       40   True   True
14       40  False   True
15       40  False   True
16       40  False  False
17       40  False  False
  • Related