print(df.groupby(['run_time','match_datetime',]).size())
run_time match_datetime
2021-08-07 00:04:36.326391 2021-08-05 00:00:00 1
2021-08-06 06:00:00 1
2021-08-06 07:00:00 1
2021-08-06 08:00:00 1
2021-08-06 08:30:00 2
..
2022-02-25 20:30:22.157029 2022-03-03 19:45:00 1
2022-03-03 20:00:00 1
2022-03-03 20:15:00 2
2022-03-03 21:00:00 1
2022-03-03 23:30:00 1
Length: 44544, dtype: int64
new df:
I want for every runtime, the top 2 dates from match_datetime e.g.
run_time match_datetime
2021-08-07 00:04:36.326391 2021-08-05
2021-08-06
2022-02-25 20:30:22.157029 2022-03-03
2022-03-02
How can I do it?
Also, on the same line, I tried to look yo convert the datetime to date but I was unable to find a solution for it ipso locum..
CodePudding user response:
Use Series.dt.normalize
or Series.dt.date
for remove times, then SeriesGroupBy.value_counts
for sorted counts, so for top 2 rows per groups use GroupBy.head
:
df['match_datetime'] = pd.to_datetime(df['match_datetime'])
s = (df['match_datetime'].dt.normalize()
.groupby(df['run_time'])
.value_counts()
.groupby(level=0)
.head(2))
print (s)
run_time match_datetime
2021-08-07 00:04:36.326391 2021-08-06 5
2021-08-05 1
2022-02-25 20:30:22.157029 2022-03-03 6
Name: match_datetime, dtype: int64