I have a data frame where the the columns are “city” and “datetime”. The data indicates the arrival of VIP’s into the city.
City datetime
New York 2022-12-06 10:37:25
New York 2022-12-06 10:42:34
New York 2022-12-06 10:47:12
New York 2022-12-06 10:52:10
New York 2022-12-06 02:37:25
As you can see the last column stands out from the rest as datetime. The first 3 entries are at a time interval less than 10minutes with respect to the column above and the last column datetime Interval is more than 10minutes.
Now I want to group city into 2 different groups , the first 3 as 1 group and last column alone as 1 group.
Desired out out
City datetime- count
New York 4 [‘2022-12-06 10:37:25’, 2022-12-06 10:42:34’, ‘2022-12-06 10:47:12’, ‘2022-12-06 10:52:10’]
New York 1 [‘2022-12-06 02:37:25’]
This is my first time using this forum . Any help is greatly appreciated
I have tried groupby on the ”city” column but it just group every column with the same city name . But I want to group the city based date time.
CodePudding user response:
You can simply use groupby
with Grouper
:
# create df
df = pd.DataFrame({
'City': ['New York', 'New York', 'New York', 'New York', 'New York'],
'datetime': ['2022-12-06 10:37:25', '2022-12-06 10:42:34', '2022-12-06 10:47:12', '2022-12-06 10:52:10',
'2022-12-06 02:37:25']
})
# set datetime col as index
df['datetime'] = pd.to_datetime(df['datetime'])
df.set_index('datetime', inplace=True)
df['Date'] = df.index
# groupby
grouped = df.groupby(['City', pd.Grouper(freq='15min', origin='start')])
new_df = grouped.count()
new_df['Dates'] = grouped['Date'].apply(list)
new_df.reset_index().drop('datetime', axis=1)
output:
Learn more: pandas.Grouper