Home > other >  Filling pandas column with range of numbers based on two conditions
Filling pandas column with range of numbers based on two conditions

Time:11-24

I'm trying to fill in a column with numbers -5000 to 5004, stepping by 4, between a condition in one column and a condition in another. The count starts when start==1. The count won't always get to 5004, so it needs to stop when end==1

Here is a example of the input:

start  end
1      0
0      0
0      0
0      0
0      0
0      1
0      0
0      0
1      0
0      0

I have tried np.arange:

df['time'] = df['start'].apply(lambda x: np.arange(-5000,5004,4) if x==1 else 0)

This obviously doesn't work - I ended up with a series in one cell. I also messed around with cycle from itertools, but that doesn't work because the distances between the start and end aren't always equal. I also feel there might be a way to do this with ffill:

rise = df[df.start.where(df.start==1).ffill(limit=1250).notnull()]

Not sure how to edit this to stop at the correct place though.

I'd love to have a lambda function that achieves this, but I'm not sure where to go from here.

Here is my expected output:

start  end  time
1      0    -5000
0      0    -4996
0      0    -4992
0      0    -4988
0      0    -4984
0      1    -4980
0      0     nan
0      0     nan
1      0    -5000
0      0    -4996

CodePudding user response:

grouping = df['start'].add(df['end'].shift(1).fillna(0)).cumsum()
df['time'] = (df.groupby(grouping).cumcount() * 4 - 5000)
df.loc[df.groupby(grouping).filter(lambda x: x[['start', 'end']].sum().sum() == 0).index, 'time'] = np.nan

Output:

>>> df
   start  end    time
0      1    0 -5000.0
1      0    0 -4996.0
2      0    0 -4992.0
3      0    0 -4988.0
4      0    0 -4984.0
5      0    1 -4980.0
6      0    0     NaN
7      0    0     NaN
8      1    0 -5000.0
9      0    0 -4996.0
  • Related