Home > other >  Vectorized way to create a column based on indexes stored in another column
Vectorized way to create a column based on indexes stored in another column

Time:09-05

I have a column that stores the indexes of the last valid index of another column in a rolling window. This was done based on this answer.

So e.g. we had

d = {'col': [True, False, True, True, False, False]}

df = pd.DataFrame(data=d)

and then we got the last valid index in a rolling window with

df['new'] = df.index
df['new'] = df['new'].where(df.col).ffill().rolling(3).max()

0    NaN
1    NaN
2    2.0
3    3.0
4    3.0
5    3.0

How can I use those indexes to store to a new column new_col the values of a different column col_b in the same dataframe at the indexes recorded above?

e.g. if a different column col_b was

'col_b': [100, 200, 300, 400, 500, 600]

then the expected outcome of new_col based on the indexes above would be

0    NaN
1    NaN
2    300
3    400
4    400
5    400

PS. Let me know if it's easier to directly use the initial col for this purpose in some way

CodePudding user response:

One idea is create index by col_b and then call Series.idxmax for indices by maximal values from original index:

df = df.set_index('col_b')
df['new']=df.index.to_series().where(df.col).ffill().rolling(3).apply(lambda x: x.idxmax())
df = df.reset_index(drop=True)

print (df)
     col    new
0   True    NaN
1  False    NaN
2   True  300.0
3   True  400.0
4  False  400.0
5  False  400.0

CodePudding user response:

Does this work?

new_v2 = df['new'].copy()
new_v2[np.isnan(new_v2)] = 0
new_v2 = new_v2.astype(int)
new_b = df['col_b'].to_numpy()[new_v2]
new_b = new_b.astype('float')
new_b[np.isnan(df['new'])] = np.nan
df['new_b'] = new_b
  • Related