I have the following dataframe and would like to create a new column based on a condition. The new column should contain 'Night' for the hours between 20:00 and 06:00, 'Morning' for the time between 06:00 and 14:30 and 'Afternoon' for the time between 14:30 and 20:00. How can I formulate and apply such a condition best way?
import pandas as pd
df = {'A' : ['test', '2222', '1111', '3333', '1111'],
'B' : ['aaa', 'aaa', 'bbbb', 'ccccc', 'aaa'],
'Date' : ['15.07.2018 06:23:56', '15.07.2018 01:23:56', '15.07.2018 06:40:06', '15.07.2018 11:38:27', '15.07.2018 21:38:27'],
'Defect': [0, 1, 0, 1, 0]
}
df = pd.DataFrame(df)
df['Date'] = pd.to_datetime(df['Date'])
CodePudding user response:
You can use np.select
:
from datetime import time
condlist = [df['Date'].dt.time.between(time(6), time(14, 30)),
df['Date'].dt.time.between(time(14,30), time(20))]
df['Time'] = np.select(condlist, ['Morning', 'Afternoon'], default='Night')
Output:
>>> df
A B Date Defect Time
0 test aaa 2018-07-15 06:23:56 0 Morning
1 2222 aaa 2018-07-15 01:23:56 1 Night
2 1111 bbbb 2018-07-15 06:40:06 0 Morning
3 3333 ccccc 2018-07-15 11:38:27 1 Morning
4 1111 aaa 2018-07-15 21:38:27 0 Night
Note, you don't need the condition for 'Night':
df['Date'].dt.time.between(time(20), time(23,59,59)) \
| df['Date'].dt.time.between(time(0), time(6))
because np.select
can take a default
value as argument.
CodePudding user response:
You can create an index of the date field and then use indexer_between_time
.
idx = pd.DatetimeIndex(df["Date"])
conditions = [
("20:00", "06:00", "Night"),
("06:00", "14:30", "Morning"),
("14:30", "20:00", "Afternoon"),
]
for cond in conditions:
start, end, val = cond
df.loc[idx.indexer_between_time(start, end, include_end=False), "Time_of_Day"] = val
A B Date Defect Time_of_Day
0 test aaa 2018-07-15 06:23:56 0 Morning
1 2222 aaa 2018-07-15 01:23:56 1 Night
2 1111 bbbb 2018-07-15 06:40:06 0 Morning
3 3333 ccccc 2018-07-15 11:38:27 1 Morning
4 1111 aaa 2018-07-15 21:38:27 0 Night