I have a dataframe
pd.DataFrame([1,2,3,4,1,2,3])
0
0 1
1 2
2 3
3 4
4 1
5 2
6 3
I want to create another column, where it records the most recent index the value "1" occurred
d={'data':[1,2,3,4,1,2,3], 'desired_new_col': [0,0,0,0,4,4,4]}
pd.DataFrame(d)
data desired_new_col
0 1 0
1 2 0
2 3 0
3 4 0
4 1 4
5 2 4
6 3 4
I have some idea of using df.expand().apply(func), but not sure what would be an appropriate function to write for this.
Thanks
CodePudding user response:
You can do cumsum
with sub-group by key then we can groupby
with transform
idxmax
s = df['data'].eq(1)
df['out'] = s.groupby(s.cumsum())['data'].transform('idxmax')
Out[293]:
0 0
1 0
2 0
3 0
4 4
5 4
6 4
Name: data, dtype: int64
CodePudding user response:
Using a mask on the index and ffill
:
df = pd.DataFrame({'data': [1,2,3,4,1,2,3]})
df['new'] = (df.index.to_series()
.where(df['data'].eq(1))
.ffill(downcast='infer')
)
Output:
data new
0 1 0
1 2 0
2 3 0
3 4 0
4 1 4
5 2 4
6 3 4
CodePudding user response:
You can do this just by using list comprehension
. :)
idx = [i for i in df.index if df[0][i] == 1][-1]
df['desired_new_col'] = [idx if idx <= df.index[i] else 0 for i in df.index]
Output:
df
0 desired_new_col
0 1 0
1 2 0
2 3 0
3 4 0
4 1 4
5 2 4
6 3 4