Home > Enterprise >  rolling most recent index where a value ocurred
rolling most recent index where a value ocurred

Time:02-22

I have a dataframe

pd.DataFrame([1,2,3,4,1,2,3])
    0
0   1
1   2
2   3
3   4
4   1
5   2
6   3

I want to create another column, where it records the most recent index the value "1" occurred

d={'data':[1,2,3,4,1,2,3], 'desired_new_col': [0,0,0,0,4,4,4]}
pd.DataFrame(d)

data    desired_new_col
0   1   0
1   2   0
2   3   0
3   4   0
4   1   4
5   2   4
6   3   4

I have some idea of using df.expand().apply(func), but not sure what would be an appropriate function to write for this.

Thanks

CodePudding user response:

You can do cumsum with sub-group by key then we can groupby with transform idxmax

s = df['data'].eq(1)
df['out'] = s.groupby(s.cumsum())['data'].transform('idxmax')
Out[293]: 
0    0
1    0
2    0
3    0
4    4
5    4
6    4
Name: data, dtype: int64

CodePudding user response:

Using a mask on the index and ffill:

df = pd.DataFrame({'data': [1,2,3,4,1,2,3]})

df['new'] = (df.index.to_series()
               .where(df['data'].eq(1))
               .ffill(downcast='infer')
             )

Output:

   data  new
0     1    0
1     2    0
2     3    0
3     4    0
4     1    4
5     2    4
6     3    4

CodePudding user response:

You can do this just by using list comprehension. :)

idx = [i for i in df.index if df[0][i] == 1][-1]
df['desired_new_col'] = [idx if idx <= df.index[i] else 0 for i in df.index]

Output:

df

    0   desired_new_col
0   1   0
1   2   0
2   3   0
3   4   0
4   1   4
5   2   4
6   3   4
  • Related