Home > Mobile >  Grouping all the rows with close timestamps in pandas dataframe
Grouping all the rows with close timestamps in pandas dataframe

Time:10-12

I have a df that looks like this, it contains frequencies recorded at some specific time and place.

                Time    Latitude    Longitude    frequency
0   2022-07-07 00:47:49 31.404463   73.117654   -88.599998
1   2022-07-09 00:13:13 31.442087   73.051086   -88.400002
2   2022-07-13 14:25:45 31.433669   73.118194   -87.500000
3   2022-07-13 17:50:53 31.411087   73.094298   -90.199997
4   2022-07-13 17:50:55 31.411278   73.094554   -89.000000
5   2022-07-14 10:49:13 31.395443   73.108911   -88.000000
6   2022-07-14 10:49:15 31.395436   73.108902   -87.699997
7   2022-07-14 10:49:19 31.395379   73.108847   -87.300003
8   2022-07-14 10:50:29 31.393905   73.107315   -88.000000
9   2022-07-14 10:50:31 31.393879   73.107283   -89.000000
10  2022-07-14 10:50:33 31.393858   73.107265   -89.800003

I want to group all the rows which are just 2 seconds apart (like there are 3 rows index 5-7 which have a time difference of just 2 seconds). Similarly, index 8-10 also have the same difference and I want to place them in a separate group and keep only these unique groups.

so far I have tried this,

df.groupby([pd.Grouper(key='Time', freq='25S')]).frequency.count()

It helps a little as I have to manually insert a time duration in which I am looking for close timestamps records. Still, in my case, I don't have specific time intervals as there can be 50 or more consecutive rows with a gap of 2 seconds in-between for the next two minutes. I just want to keep all these rows in a unique group.

CodePudding user response:

My solution is to greate a column Group which groups the rows for which the difference is small.

First sort the column Time (if necessary): df = df.sort_values('Time').

Now create the groups:

n = 2 # number of seconds
df['Group'] = df.Time.diff().dt.seconds.gt(n).cumsum()

Now you can do

df.groupby('Group').frequency.count()
  • Related