Home > Enterprise >  Pandas - changing rows where less than n subsequent values are equal
Pandas - changing rows where less than n subsequent values are equal

Time:11-16

I have the following dataframe:

df = pd.DataFrame({"col":[0,0,1,1,1,1,0,0,1,1,0,0,1,1,1,0,1,1,1,1,0,0,0]})

Now I would like to set all the rows equal to zero where less than four 1's appear "in a row", i.e. I would like to have the following resulting DataFrame:

df = pd.DataFrame({"col":[0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0]})

I was not able to find a way to achieve this nicely...

CodePudding user response:

Try with groupby and where:

streaks = df.groupby(df["col"].ne(df["col"].shift()).cumsum()).transform("sum")
output = df.where(streaks.ge(4), 0)

>>> output
    col
0     0
1     0
2     1
3     1
4     1
5     1
6     0
7     0
8     0
9     0
10    0
11    0
12    0
13    0
14    0
15    0
16    1
17    1
18    1
19    1
20    0
21    0
22    0

CodePudding user response:

We can do

df.loc[df.groupby(df.col.eq(0).cumsum()).transform('count')['col']<5,'col'] = 0
df
Out[77]: 
    col
0     0
1     0
2     1
3     1
4     1
5     1
6     0
7     0
8     0
9     0
10    0
11    0
12    0
13    0
14    0
15    0
16    1
17    1
18    1
19    1
20    0
21    0
22    0
  • Related