I have date_time_df
dataframe with date_time
column as like this one:
import datetime
date_time_df = pd.DataFrame({'date_time':[datetime.datetime(2020, 7, 10, 9, 15, 3),
datetime.datetime(2020, 7, 9, 9, 50, 0),
datetime.datetime(2020, 7, 9, 10, 50, 3),
datetime.datetime(2020, 7, 9, 16, 50, 0),
datetime.datetime(2020, 7, 9, 20, 30, 0),
datetime.datetime(2020, 7, 8, 9, 50, 0),
datetime.datetime(2020, 7, 8, 10, 50, 3),
datetime.datetime(2020, 7, 8, 16, 50, 0),
datetime.datetime(2020, 7, 8, 20, 30, 0),
datetime.datetime(2020, 7, 7, 10, 50, 3),
datetime.datetime(2020, 7, 7, 16, 50, 0),
datetime.datetime(2020, 7, 6, 16, 50, 0),
datetime.datetime(2020, 7, 6, 20, 30, 0),
datetime.datetime(2020, 7, 5, 9, 50, 0),
datetime.datetime(2020, 7, 5, 20, 30, 0),
datetime.datetime(2020, 7, 4, 16, 50, 0),
datetime.datetime(2020, 7, 3, 9, 50, 0),
datetime.datetime(2020, 7, 3, 10, 50, 3),
datetime.datetime(2020, 7, 3, 16, 50, 0),
datetime.datetime(2020, 7, 2, 9, 50, 0),
datetime.datetime(2020, 7, 2, 17, 0, 0),
datetime.datetime(2020, 7, 1, 10, 45, 3),
datetime.datetime(2020, 7, 1, 17, 0, 0),
datetime.datetime(2020, 7, 1, 20, 30, 0)]})
I need to sort it by the following conditions:
- for each date select only one time-slot around 10 hours (
time >= 10
hours); - in case if for some date there is only one timeslot -> keep it in case if it after 10 h (i.e
time >=10
hours); - if date doesn't have timeslot around 10 h, include the next one (it might be equals
16:50
or20:30
or17:00
in the above example)
I tried to solve it by the following steps:
- split
date_time_df
data frame to date, time columns. So,date_time_df
has 3 columns as like['date_time', 'date', 'time']
- create new
gr_df
as a result of:
gr_df = date_time_df.groupby('date')['time'].nunique().reset_index()
gr_df.rename(columns={'time':'count_timeslots'}, inplace=True)
- filter
gr_df
and select rows which have only one timeslot:
dates_with_one_timeslot = list(gr_df[gr_df['count_timeslots']==1].date_time.unique())
- filter
gr_df
and select rows which have slot around 10 hours:
req_timeslots = list(date_time_df[date_time_df['time'].dt.strftime('%H')=='10'].date_time.unique())
- I don't have idea of How to cover condition for dates which doesn't have times lot around 10 hours, but it has next one after 10h (time might be equals
16:50
or20:30
or17:00
)
Correct output:
result_lst = [
datetime.datetime(2020, 7, 9, 10, 50, 3),
datetime.datetime(2020, 7, 8, 10, 50, 3),
datetime.datetime(2020, 7, 7, 10, 50, 3),
datetime.datetime(2020, 7, 6, 16, 50, 0),
datetime.datetime(2020, 7, 5, 20, 30, 0),
datetime.datetime(2020, 7, 4, 16, 50, 0),
datetime.datetime(2020, 7, 3, 10, 50, 3),
datetime.datetime(2020, 7, 2, 17, 0, 0),
datetime.datetime(2020, 7, 1, 10, 45, 3)
]
I'd appreciate for any idea, Thanks!
CodePudding user response:
Filter the rows where hour
is greater than or equal to 10
then sort the values by date_time
in ascending order and drop duplicates by date
component
m = df['date_time'].dt.hour >= 10
df['date'] = df['date_time'].dt.date
df[m].sort_values('date_time').drop_duplicates('date')
date_time date
21 2020-07-01 10:45:03 2020-07-01
20 2020-07-02 17:00:00 2020-07-02
17 2020-07-03 10:50:03 2020-07-03
15 2020-07-04 16:50:00 2020-07-04
14 2020-07-05 20:30:00 2020-07-05
11 2020-07-06 16:50:00 2020-07-06
9 2020-07-07 10:50:03 2020-07-07
6 2020-07-08 10:50:03 2020-07-08
2 2020-07-09 10:50:03 2020-07-09