I'm trying to fill in a column with numbers -5000 to 5004, stepping by 4, between a condition in one column and a condition in another. The count starts when start==1
. The count won't always get to 5004
, so it needs to stop when end==1
Here is a example of the input:
start end
1 0
0 0
0 0
0 0
0 0
0 1
0 0
0 0
1 0
0 0
I have tried np.arange
:
df['time'] = df['start'].apply(lambda x: np.arange(-5000,5004,4) if x==1 else 0)
This obviously doesn't work - I ended up with a series in one cell. I also messed around with cycle
from itertools
, but that doesn't work because the distances between the start and end aren't always equal. I also feel there might be a way to do this with ffill
:
rise = df[df.start.where(df.start==1).ffill(limit=1250).notnull()]
Not sure how to edit this to stop at the correct place though.
I'd love to have a lambda function that achieves this, but I'm not sure where to go from here.
Here is my expected output:
start end time
1 0 -5000
0 0 -4996
0 0 -4992
0 0 -4988
0 0 -4984
0 1 -4980
0 0 nan
0 0 nan
1 0 -5000
0 0 -4996
CodePudding user response:
grouping = df['start'].add(df['end'].shift(1).fillna(0)).cumsum()
df['time'] = (df.groupby(grouping).cumcount() * 4 - 5000)
df.loc[df.groupby(grouping).filter(lambda x: x[['start', 'end']].sum().sum() == 0).index, 'time'] = np.nan
Output:
>>> df
start end time
0 1 0 -5000.0
1 0 0 -4996.0
2 0 0 -4992.0
3 0 0 -4988.0
4 0 0 -4984.0
5 0 1 -4980.0
6 0 0 NaN
7 0 0 NaN
8 1 0 -5000.0
9 0 0 -4996.0