Home > Mobile >  Trying to get nearest value in columns which is of datetime datatype
Trying to get nearest value in columns which is of datetime datatype

Time:08-17

I have 3 columns, in which 2 column are datetime, i wrote a code which will find if the difference between 1 column row value and 2nd column row value is less than or equal to 30 seconds, if there's a 30 seconds difference, then i'll create another column saying True or False. Below is the code

enter image description here

df = pd.DataFrame({'RT_Date': ['2021-02-19 17:59:00', '2021-02-19 17:59:00', '2021-02-19 17:59:00', '2021-02-19 17:59:00'],
               'ID': ['101', '101', '102', '102'],
               'DateCreated': ['2021-02-19 17:58:38', '2021-02-19 17:58:31', '2021-02-19 17:58:38', '2021-02-19 17:58:41']})

df['RT_Date'] = pd.to_datetime(df['RT_Date'])
df['DateCreated'] = pd.to_datetime(df['DateCreated'])

df['Diff'] = df['RT_Date'] - df['DateCreated']
df['Diff'] = df['Diff'].apply(lambda x: x.total_seconds())

df['Diff'] = df['Diff'].apply(lambda x: True if x <= 30 else False)

Below is the output of above code

enter image description here

I tried below code, but it's not working

df.loc[df['Diff'] <= pd.Timedelta(seconds=30)].groupby(['RT_Date','DateCreated']).min().reset_index()

We have two unique ID's 101 and 102. I'm trying to return the nearest rows for each unique ID which contains less difference. For example in the row 0, we can see for ID 101, the difference is 22 seconds where as for row 1, for ID 101 the difference is 29 seconds. Same for row 2, for ID 102 the difference is 22 and for row 3 the difference is 19. Below is the expected output

    RT_Date ID              DateCreated         Diff  Nearest
0   2021-02-19 17:59:00 101 2021-02-19 17:58:38 True  22 seconds
1   2021-02-19 17:59:00 102 2021-02-19 17:58:41 True  19 seconds

CodePudding user response:

IIUC, use:

df2 = (df
 .assign(diff=df['RT_Date'].sub(df['DateCreated']).dt.total_seconds())
 .loc[lambda d: d['diff'].le(30)]
 .loc[lambda d: d.groupby('ID')['diff'].idxmin()]
 )

Output:

              RT_Date   ID         DateCreated  diff
0 2021-02-19 17:59:00  101 2021-02-19 17:58:38  22.0
3 2021-02-19 17:59:00  102 2021-02-19 17:58:41  19.0

Or keeping diff as Timedelta:

df2 = (df
 .assign(diff=df['RT_Date']-df['DateCreated'])
 .loc[lambda d: d['diff'].le('30s')]
 .loc[lambda d: d.groupby('ID')['diff'].idxmin()]
 )

Output:

              RT_Date   ID         DateCreated            diff
0 2021-02-19 17:59:00  101 2021-02-19 17:58:38 0 days 00:00:22
3 2021-02-19 17:59:00  102 2021-02-19 17:58:41 0 days 00:00:19
  • Related