I have 3 columns, in which 2 column are datetime, i wrote a code which will find if the difference between 1 column row value and 2nd column row value is less than or equal to 30 seconds, if there's a 30 seconds difference, then i'll create another column saying True or False. Below is the code
df = pd.DataFrame({'RT_Date': ['2021-02-19 17:59:00', '2021-02-19 17:59:00', '2021-02-19 17:59:00', '2021-02-19 17:59:00'],
'ID': ['101', '101', '102', '102'],
'DateCreated': ['2021-02-19 17:58:38', '2021-02-19 17:58:31', '2021-02-19 17:58:38', '2021-02-19 17:58:41']})
df['RT_Date'] = pd.to_datetime(df['RT_Date'])
df['DateCreated'] = pd.to_datetime(df['DateCreated'])
df['Diff'] = df['RT_Date'] - df['DateCreated']
df['Diff'] = df['Diff'].apply(lambda x: x.total_seconds())
df['Diff'] = df['Diff'].apply(lambda x: True if x <= 30 else False)
Below is the output of above code
I tried below code, but it's not working
df.loc[df['Diff'] <= pd.Timedelta(seconds=30)].groupby(['RT_Date','DateCreated']).min().reset_index()
We have two unique ID's 101 and 102. I'm trying to return the nearest rows for each unique ID which contains less difference. For example in the row 0, we can see for ID 101, the difference is 22 seconds where as for row 1, for ID 101 the difference is 29 seconds. Same for row 2, for ID 102 the difference is 22 and for row 3 the difference is 19. Below is the expected output
RT_Date ID DateCreated Diff Nearest
0 2021-02-19 17:59:00 101 2021-02-19 17:58:38 True 22 seconds
1 2021-02-19 17:59:00 102 2021-02-19 17:58:41 True 19 seconds
CodePudding user response:
IIUC, use:
df2 = (df
.assign(diff=df['RT_Date'].sub(df['DateCreated']).dt.total_seconds())
.loc[lambda d: d['diff'].le(30)]
.loc[lambda d: d.groupby('ID')['diff'].idxmin()]
)
Output:
RT_Date ID DateCreated diff
0 2021-02-19 17:59:00 101 2021-02-19 17:58:38 22.0
3 2021-02-19 17:59:00 102 2021-02-19 17:58:41 19.0
Or keeping diff as Timedelta:
df2 = (df
.assign(diff=df['RT_Date']-df['DateCreated'])
.loc[lambda d: d['diff'].le('30s')]
.loc[lambda d: d.groupby('ID')['diff'].idxmin()]
)
Output:
RT_Date ID DateCreated diff
0 2021-02-19 17:59:00 101 2021-02-19 17:58:38 0 days 00:00:22
3 2021-02-19 17:59:00 102 2021-02-19 17:58:41 0 days 00:00:19