Changing row values until a particular row is appeared on dataframe-CodePudding

I am looking for a solution to changing row values until a particular row is appeared.

For example, here is a dataframe

df = pd.DataFrame({'ts_val' : [0,0,-1,-1,-1,-1,0,0,0,0,0],
                   'on&off' : [0,0,1,0,0,0,1,0,1,0,1]
    })  
           
Out[11] df
   ts_val  on&off
0       0       0
1       0       0
2      -1       1 #starting point
3      -1       0
4      -1       0
5      -1       0
6       0       1 #end point
7       0       0
8       0       1 #starting point 2
9       0       0
10      0       1 #end point 2

On above dataframe, I want to replace row of df['on&off'] as 1 untill another 1 is appeared on the df['on&off']

In more detail, on the example, 1 is appeared on df['on&off'][2] which is a starting point, and another 1 which is end point is appeared on the df['on&off'][6]. I need to replace the value to 1 between df['on&off'][2:6].

Such as, this dataframe :

   ts_val  on&off
0       0       0
1       0       0
2      -1       1
3      -1       1
4      -1       1
5      -1       1
6       0       1
7       0       0
8       0       1
9       0       1
10      0       1

CodePudding user response：

Try using the following snippet - create groups by cumulative sum by Series.cumsum, get groups contains 1 in original column with all unpair values, filter them to variable groups and set only filtered groups to 1:

s = df['on&off'].cumsum()
m = (s % 2).eq(1) & df['on&off'].eq(1)

groups = s[m]

df.loc[s.isin(groups), 'on&off'] = 1
print (df)

# Output:

    ts_val  on&off
0        0       0
1        0       0
2       -1       1
3       -1       1
4       -1       1
5       -1       1
6        0       1
7        0       0
8        0       1
9        0       1
10       0       1

EDIT:

If last 1 is missing, so last group is not complete and this group cannot be processed solution is:

#tested solution with pair `1`s
df = pd.DataFrame({'ts_val' : [0,0,-1,-1,-1,-1,0,0,0,0,0],
                   'on&off' : [0,0,1,0,0,0,1,0,1,0,1]
    })  
           
s = df['on&off'].cumsum()
m1 = (s % 2).eq(1) & df['on&off'].eq(1)
m2 = (s % 2).eq(0) & df['on&off'].eq(1)

groups1 = s[m1]
groups2 = s.shift()[m2]

groups = set(groups1) & set(groups2)
print (groups)
{1.0, 3.0}

df.loc[s.isin(groups), 'on&off'] = 1
print (df)
    ts_val  on&off
0        0       0
1        0       0
2       -1       1
3       -1       1
4       -1       1
5       -1       1
6        0       1
7        0       0
8        0       1
9        0       1
10       0       1

#last 1 is removed, so only one group
df = pd.DataFrame({'ts_val' : [0,0,-1,-1,-1,-1,0,0,0,0,0],
                   'on&off' : [0,0,1,0,0,0,1,0,1,0,0]
    })  
           
s = df['on&off'].cumsum()
m1 = (s % 2).eq(1) & df['on&off'].eq(1)
m2 = (s % 2).eq(0) & df['on&off'].eq(1)

groups1 = s[m1]
groups2 = s.shift()[m2]

groups = set(groups1) & set(groups2)
print (groups)
{1.0}

df.loc[s.isin(groups), 'on&off'] = 1
print (df)
    ts_val  on&off
0        0       0
1        0       0
2       -1       1
3       -1       1
4       -1       1
5       -1       1
6        0       1
7        0       0
8        0       1
9        0       0
10       0       0

CodePudding user response：

I have basically used a flag to track the start and end points. The df.shape gives the row and column count.

flag = 1
for i in range(df.shape[0]):
  if df['on&off'].iloc[i] == 1 and flag == 1:
    flag = 0
  elif df['on&off'].iloc[i] == 0 and flag == 0:
    df['on&off'].iloc[i] = 1
  elif df['on&off'].iloc[i] == 1 and flag == 0:
    flag = 1