I have a df that looks like this, it contains frequencies recorded at some specific time and place.
Time Latitude Longitude frequency
0 2022-07-07 00:47:49 31.404463 73.117654 -88.599998
1 2022-07-09 00:13:13 31.442087 73.051086 -88.400002
2 2022-07-13 14:25:45 31.433669 73.118194 -87.500000
3 2022-07-13 17:50:53 31.411087 73.094298 -90.199997
4 2022-07-13 17:50:55 31.411278 73.094554 -89.000000
5 2022-07-14 10:49:13 31.395443 73.108911 -88.000000
6 2022-07-14 10:49:15 31.395436 73.108902 -87.699997
7 2022-07-14 10:49:19 31.395379 73.108847 -87.300003
8 2022-07-14 10:50:29 31.393905 73.107315 -88.000000
9 2022-07-14 10:50:31 31.393879 73.107283 -89.000000
10 2022-07-14 10:50:33 31.393858 73.107265 -89.800003
I want to group all the rows which are just 2 seconds apart (like there are 3 rows index 5-7 which have a time difference of just 2 seconds). Similarly, index 8-10 also have the same difference and I want to place them in a separate group and keep only these unique groups.
so far I have tried this,
df.groupby([pd.Grouper(key='Time', freq='25S')]).frequency.count()
It helps a little as I have to manually insert a time duration in which I am looking for close timestamps records. Still, in my case, I don't have specific time intervals as there can be 50 or more consecutive rows with a gap of 2 seconds in-between for the next two minutes. I just want to keep all these rows in a unique group.
CodePudding user response:
My solution is to greate a column Group
which groups the rows for which the difference is small.
First sort the column Time
(if necessary): df = df.sort_values('Time')
.
Now create the groups:
n = 2 # number of seconds
df['Group'] = df.Time.diff().dt.seconds.gt(n).cumsum()
Now you can do
df.groupby('Group').frequency.count()