I would like to create a new column called time_of_day for a dataframe such as below so that 06-12:59 is morning, 13-17:59 is afternoon, 18-22:59 is evening and rest of the time is night.
id | TIME |
---|---|
25 | 00:01 |
25 | 02:01 |
25 | 06:55 |
18 | 22:03 |
18 | 23:33 |
18 | 00:33 |
What I expect
id | TIME | TIME_OF_DAY |
---|---|---|
25 | 00:01 | Night |
25 | 02:01 | Night |
25 | 06:55 | Morning |
18 | 22:03 | Evening |
18 | 23:33 | Night |
18 | 00:33 | Night |
I tried the numpy vectorization with choices and conditions, but it failed. Then I tried the following:
def conditions(s):
if (pd.to_timedelta(df['TIME']) >= pd.to_timedelta('06:00:00')) & (pd.to_timedelta(df['TIME']) < pd.to_timedelta('13:00:00')):
return "Morning"
elif (pd.to_timedelta(df['TIME']) >= pd.to_timedelta('13:00:00')) & (pd.to_timedelta(df['TIME']) < pd.to_timedelta('18:00:00')):
return "Afternoon"
elif (pd.to_timedelta(df['TIME']) >= pd.to_timedelta('18:00:00')) & (pd.to_timedelta(df['TIME']) < pd.to_timedelta('23:00:00')):
return "Evening"
elif (pd.to_timedelta(df['TIME']) >= pd.to_timedelta('23:00:00')) & (pd.to_timedelta(df['TIME']) < pd.to_timedelta('06:00:00')):
return "Night"
df['TIME_OF_DAY'] = df.apply(conditions, axis=1)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
How can I overcome this issue? PS: I added .all() for the enf od each if condition, but still got the same error.
CodePudding user response:
Use pandas.cut
, it will be more efficient than a function with apply
:
bins = ['0', '06:00:00', '13:00:00', '18:00:00', '23:00:00', '24:00:00']
labels = ['Night', 'Morning', 'Afternoon', 'Evening', 'Night']
df['TIME_OF_DAY'] = pd.cut(
pd.to_timedelta(df['TIME'] ':00'),
bins=list(map(pd.Timedelta, bins)),
labels=labels, right=False, ordered=False
)
output:
id TIME TIME_OF_DAY
0 25 00:01 Night
1 25 02:01 Night
2 25 06:55 Morning
3 18 22:03 Evening
4 18 23:33 Night
5 18 00:33 Night
If you need a function:
def to_day_period(s):
bins = ['0', '06:00:00', '13:00:00', '18:00:00', '23:00:00', '24:00:00']
labels = ['Night', 'Morning', 'Afternoon', 'Evening', 'Night']
return pd.cut(
pd.to_timedelta(s ':00'),
bins=list(map(pd.Timedelta, bins)),
labels=labels, right=False, ordered=False
)
df['TIME_OF_DAY'] = to_day_period(df['TIME'])
CodePudding user response:
Assuming the sample DataFrame you mentioned in the question, you can make the following modifications to make the code work:
def conditions(t: pd.Timedelta):
if pd.Timestamp('06:00:00') <= t < pd.Timestamp('13:00:00'):
return "Morning"
elif pd.Timestamp('13:00:00') <= t < pd.Timestamp('18:00:00'):
return "Afternoon"
elif pd.Timestamp('18:00:00') <= t < pd.Timestamp('23:00:00'):
return "Evening"
else:
return "Night"
df['TIME_OF_DAY'] = df["TIME"].apply(lambda s: conditions(pd.Timestamp(s)))
Output (df
):
id TIME TIME_OF_DAY
0 25 00:01 Night
1 25 02:01 Night
2 25 06:55 Morning
3 18 22:03 Evening
4 18 23:33 Night
5 18 00:33 Night