Vectorized way to create a column based on indexes stored in another column-CodePudding

I have a column that stores the indexes of the last valid index of another column in a rolling window. This was done based on this answer.

So e.g. we had

d = {'col': [True, False, True, True, False, False]}

df = pd.DataFrame(data=d)

and then we got the last valid index in a rolling window with

df['new'] = df.index
df['new'] = df['new'].where(df.col).ffill().rolling(3).max()

0    NaN
1    NaN
2    2.0
3    3.0
4    3.0
5    3.0

How can I use those indexes to store to a new column new_col the values of a different column col_b in the same dataframe at the indexes recorded above?

e.g. if a different column col_b was

'col_b': [100, 200, 300, 400, 500, 600]

then the expected outcome of new_col based on the indexes above would be

PS. Let me know if it's easier to directly use the initial col for this purpose in some way

CodePudding user response：

One idea is create index by col_b and then call Series.idxmax for indices by maximal values from original index:

df = df.set_index('col_b')
df['new']=df.index.to_series().where(df.col).ffill().rolling(3).apply(lambda x: x.idxmax())
df = df.reset_index(drop=True)

print (df)
     col    new
0   True    NaN
1  False    NaN
2   True  300.0
3   True  400.0
4  False  400.0
5  False  400.0

CodePudding user response：

Does this work?

new_v2 = df['new'].copy()
new_v2[np.isnan(new_v2)] = 0
new_v2 = new_v2.astype(int)
new_b = df['col_b'].to_numpy()[new_v2]
new_b = new_b.astype('float')
new_b[np.isnan(df['new'])] = np.nan
df['new_b'] = new_b