Home > Back-end >  Find row with nearest value in a subset of a pandas DataFrame
Find row with nearest value in a subset of a pandas DataFrame

Time:02-17

I have a dataframe of the following structure:

import pandas as pd

df = pd.DataFrame({'x': [1,5,8,103,105,112],
                   'date': pd.DatetimeIndex(('2022-02-01', '2022-02-03', '2022-02-06',
                                             '2022-02-05', '2022-02-05', '2022-02-07'))})
     x         dt
0    1 2022-02-01
1    5 2022-02-03
2    8 2022-02-06
3  103 2022-02-05
4  105 2022-02-05
5  112 2022-02-07

How can I add a new column y that contains x if x < 100 and otherwise the x-value of the row with the next smaller date, from the subset where x < 100.

What I currently have is this code. It works, but doesn't look very efficient:

df['y'] = df.x
df_ref = df.loc[df.x < 100].sort_values('date').copy()
df_ref.set_index('x', inplace=True)
for ix, row in df.iterrows():
    if row.x >= 100:
        delta = row.date - df_ref.date
        delta_gt = delta.loc[delta > pd.Timedelta(0)]
        if delta_gt.size > 0:
            df.loc[ix, 'y'] = delta_gt.idxmin()
     x       date  y
0    1 2022-02-01  1
1    5 2022-02-03  5
2    8 2022-02-06  8
3  103 2022-02-04  5
4  105 2022-02-05  5
5  112 2022-02-07  8

CodePudding user response:

Sort by date, mask the values greater than 100 and ffill, sort by index again:

(df.sort_values(by='date')
   .assign(y=df['x'].mask(df['x'].gt(100)))
   .assign(y=lambda d: d['y'].ffill())
   .sort_index()
 )

Output:

     x       date  y
0    1 2022-02-01  1
1    5 2022-02-03  5
2    8 2022-02-06  8
3  103 2022-02-05  5
4  105 2022-02-05  5
5  112 2022-02-07  8

CodePudding user response:

We can check merge_asof

#df.date = pd.to_datetime(df.date)
df = df.sort_values('date')
out = pd.merge_asof(df,
                    df[df['x']<100].rename(columns={'x':'y'}),
                    on = 'date',
                    direction = 'backward').sort_values('x')
out
Out[160]: 
     x       date  y
0    1 2022-02-01  1
1    5 2022-02-03  5
4    8 2022-02-06  8
2  103 2022-02-05  5
3  105 2022-02-05  5
5  112 2022-02-07  8
  • Related