I have a dataframe of the following structure:
import pandas as pd
df = pd.DataFrame({'x': [1,5,8,103,105,112],
'date': pd.DatetimeIndex(('2022-02-01', '2022-02-03', '2022-02-06',
'2022-02-05', '2022-02-05', '2022-02-07'))})
x dt
0 1 2022-02-01
1 5 2022-02-03
2 8 2022-02-06
3 103 2022-02-05
4 105 2022-02-05
5 112 2022-02-07
How can I add a new column y
that contains x
if x < 100
and otherwise the x-value of the row with the next smaller date, from the subset where x < 100.
What I currently have is this code. It works, but doesn't look very efficient:
df['y'] = df.x
df_ref = df.loc[df.x < 100].sort_values('date').copy()
df_ref.set_index('x', inplace=True)
for ix, row in df.iterrows():
if row.x >= 100:
delta = row.date - df_ref.date
delta_gt = delta.loc[delta > pd.Timedelta(0)]
if delta_gt.size > 0:
df.loc[ix, 'y'] = delta_gt.idxmin()
x date y
0 1 2022-02-01 1
1 5 2022-02-03 5
2 8 2022-02-06 8
3 103 2022-02-04 5
4 105 2022-02-05 5
5 112 2022-02-07 8
CodePudding user response:
Sort by date, mask the values greater than 100 and ffill
, sort by index again:
(df.sort_values(by='date')
.assign(y=df['x'].mask(df['x'].gt(100)))
.assign(y=lambda d: d['y'].ffill())
.sort_index()
)
Output:
x date y
0 1 2022-02-01 1
1 5 2022-02-03 5
2 8 2022-02-06 8
3 103 2022-02-05 5
4 105 2022-02-05 5
5 112 2022-02-07 8
CodePudding user response:
We can check merge_asof
#df.date = pd.to_datetime(df.date)
df = df.sort_values('date')
out = pd.merge_asof(df,
df[df['x']<100].rename(columns={'x':'y'}),
on = 'date',
direction = 'backward').sort_values('x')
out
Out[160]:
x date y
0 1 2022-02-01 1
1 5 2022-02-03 5
4 8 2022-02-06 8
2 103 2022-02-05 5
3 105 2022-02-05 5
5 112 2022-02-07 8