Group datetime series-CodePudding

I have a data frame where the the columns are “city” and “datetime”. The data indicates the arrival of VIP’s into the city.

City      datetime
New York  2022-12-06 10:37:25
New York  2022-12-06 10:42:34
New York  2022-12-06 10:47:12
New York  2022-12-06 10:52:10
New York  2022-12-06 02:37:25

As you can see the last column stands out from the rest as datetime. The first 3 entries are at a time interval less than 10minutes with respect to the column above and the last column datetime Interval is more than 10minutes.

Now I want to group city into 2 different groups , the first 3 as 1 group and last column alone as 1 group.

Desired out out

City     datetime- count
New York   4 [‘2022-12-06 10:37:25’,  2022-12-06 10:42:34’, ‘2022-12-06 10:47:12’, ‘2022-12-06 10:52:10’]
New York   1 [‘2022-12-06 02:37:25’]

This is my first time using this forum . Any help is greatly appreciated

I have tried groupby on the ”city” column but it just group every column with the same city name . But I want to group the city based date time.

CodePudding user response：

You can simply use groupby with Grouper:

# create df
df = pd.DataFrame({
    'City': ['New York', 'New York', 'New York', 'New York', 'New York'],
    'datetime': ['2022-12-06 10:37:25', '2022-12-06 10:42:34', '2022-12-06 10:47:12', '2022-12-06 10:52:10',
                 '2022-12-06 02:37:25']
})

# set datetime col as index
df['datetime'] = pd.to_datetime(df['datetime'])
df.set_index('datetime', inplace=True)
df['Date'] = df.index

# groupby
grouped = df.groupby(['City', pd.Grouper(freq='15min', origin='start')])
new_df = grouped.count()
new_df['Dates'] = grouped['Date'].apply(list)
new_df.reset_index().drop('datetime', axis=1)

output:

Learn more: pandas.Grouper