I am looking for a solution to changing row values until a particular row is appeared.
For example, here is a dataframe
df = pd.DataFrame({'ts_val' : [0,0,-1,-1,-1,-1,0,0,0,0,0],
'on&off' : [0,0,1,0,0,0,1,0,1,0,1]
})
Out[11] df
ts_val on&off
0 0 0
1 0 0
2 -1 1 #starting point
3 -1 0
4 -1 0
5 -1 0
6 0 1 #end point
7 0 0
8 0 1 #starting point 2
9 0 0
10 0 1 #end point 2
On above dataframe, I want to replace row of df['on&off']
as 1 untill another 1 is appeared on the df['on&off']
In more detail, on the example, 1 is appeared on df['on&off'][2]
which is a starting point, and another 1 which is end point is appeared on the df['on&off'][6]
. I need to replace the value to 1 between df['on&off'][2:6]
.
Such as, this dataframe :
ts_val on&off
0 0 0
1 0 0
2 -1 1
3 -1 1
4 -1 1
5 -1 1
6 0 1
7 0 0
8 0 1
9 0 1
10 0 1
CodePudding user response:
Try using the following snippet - create groups by cumulative sum by Series.cumsum
, get groups contains 1
in original column with all unpair values, filter them to variable groups
and set only filtered groups to 1
:
s = df['on&off'].cumsum()
m = (s % 2).eq(1) & df['on&off'].eq(1)
groups = s[m]
df.loc[s.isin(groups), 'on&off'] = 1
print (df)
# Output:
ts_val on&off
0 0 0
1 0 0
2 -1 1
3 -1 1
4 -1 1
5 -1 1
6 0 1
7 0 0
8 0 1
9 0 1
10 0 1
EDIT:
If last 1
is missing, so last group is not complete and this group cannot be processed solution is:
#tested solution with pair `1`s
df = pd.DataFrame({'ts_val' : [0,0,-1,-1,-1,-1,0,0,0,0,0],
'on&off' : [0,0,1,0,0,0,1,0,1,0,1]
})
s = df['on&off'].cumsum()
m1 = (s % 2).eq(1) & df['on&off'].eq(1)
m2 = (s % 2).eq(0) & df['on&off'].eq(1)
groups1 = s[m1]
groups2 = s.shift()[m2]
groups = set(groups1) & set(groups2)
print (groups)
{1.0, 3.0}
df.loc[s.isin(groups), 'on&off'] = 1
print (df)
ts_val on&off
0 0 0
1 0 0
2 -1 1
3 -1 1
4 -1 1
5 -1 1
6 0 1
7 0 0
8 0 1
9 0 1
10 0 1
#last 1 is removed, so only one group
df = pd.DataFrame({'ts_val' : [0,0,-1,-1,-1,-1,0,0,0,0,0],
'on&off' : [0,0,1,0,0,0,1,0,1,0,0]
})
s = df['on&off'].cumsum()
m1 = (s % 2).eq(1) & df['on&off'].eq(1)
m2 = (s % 2).eq(0) & df['on&off'].eq(1)
groups1 = s[m1]
groups2 = s.shift()[m2]
groups = set(groups1) & set(groups2)
print (groups)
{1.0}
df.loc[s.isin(groups), 'on&off'] = 1
print (df)
ts_val on&off
0 0 0
1 0 0
2 -1 1
3 -1 1
4 -1 1
5 -1 1
6 0 1
7 0 0
8 0 1
9 0 0
10 0 0
CodePudding user response:
I have basically used a flag to track the start and end points. The df.shape gives the row and column count.
flag = 1
for i in range(df.shape[0]):
if df['on&off'].iloc[i] == 1 and flag == 1:
flag = 0
elif df['on&off'].iloc[i] == 0 and flag == 0:
df['on&off'].iloc[i] = 1
elif df['on&off'].iloc[i] == 1 and flag == 0:
flag = 1