Replace value after continuous same value in column-CodePudding

I have a data frame like this:

df = pd.DataFrame({'A': [1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0,
                         0, 0, 0, 0, 0, 0, 1, 1]})

If I have n continuous ones (in this case n = 8), the gap between next continuous n ones is 4 zeros(I would like set up a rule eg: the gap between continuous number is m <=4 ), how can I replace those 4 zeros with 1?

My ideal out put would be like this:

df = pd.DataFrame({'A': [1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1], 'Fill_Gap': [1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0,0, 0, 0, 0, 0, 0, 1, 1]})

Only four zeros (at index 13-16) replaced by 1 cause they have 8 continuous 1 before and after.

Any advice would be much appreciated!

CodePudding user response：

You can use regex if you join the column into a string. With regex you can search for 4 zeros or less with 0{,4} and lookbehind ... lookahead for 8 ones with (?<=1{8})...(?=1{8}). I don't think this is an efficient solution.

import re

df['fill_gap'] = df['A']
for i in re.finditer('(?<=1{8})0{,4}(?=1{8})', ''.join(df.fill_gap.astype('str'))):
    df.fill_gap.iloc[slice(*i.span())] = 1
df

Output

    A  fill_gap
0   1         1
1   1         1
2   1         1
3   0         0
4   0         0
5   1         1
6   1         1
7   1         1
8   1         1
9   1         1
10  1         1
11  1         1
12  1         1
13  0         1
14  0         1
15  0         1
16  0         1
17  1         1
18  1         1
19  1         1
20  1         1
21  1         1
22  1         1
23  1         1
24  1         1
25  0         0
26  1         1
27  1         1
28  1         1
29  0         0
30  0         0
31  0         0
32  0         0
33  0         0
34  0         0
35  0         0
36  0         0
37  0         0
38  1         1
39  1         1

CodePudding user response：

This will work for series of any length:

df = pd.DataFrame({'A': [1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0,
                         0, 0, 0, 0, 0, 0, 1, 1]})

#Check for runs of 8 (1's)
lst1=(df.shift(periods=0).A==1)
for x in range(1,8):
    lst1=lst1&(df.shift(periods=x).A==1)
    
#Check for runs of 4 (0's)
lst0=(df.shift(periods=0).A==0)
for x in range(1,4):
    lst0=lst0&(df.shift(periods=x).A==0)
    
#Get index
ones=np.array(list(lst1.index))[lst1]
zeros=np.array(list(lst0.index))[lst0]

#Fill Gaps
for x in list(range(1, len(ones))):
    if any(lst0[ones[x-1]:ones[x]]):
        lst1[ones[x-1]:ones[x]]=True
        
#Apply to data frame
df.loc[lst1, 'A']=1

Output: