I have a dataset that contains a column of datetime of a month, and I need to divide it into two blocks (day and night or am\pm) and then discretize the time in each block into 10mins bins. I could add another column of 0 and 1 to show it is am or pm, but I cannot discretize it! Can you please help me with it?
df['started_at'] = pd.to_datetime(df['started_at'])
df['start hour'] = df['started_at'].dt.hour.astype('int')
df['mor/aft'] = np.where(df['start hour'] < 12, 1, 0)
df['started_at']
0 16:05:36
2 06:22:40
3 16:08:10
4 12:28:57
6 15:47:30
...
3084526 15:24:24
3084527 16:33:07
3084532 14:08:12
3084535 09:43:46
3084536 17:02:26
CodePudding user response:
If I understood correctly you are trying to add a column for every interval of ten minutes to indicate if an observation is from that interval of time.
You can use lambda expressions
to loop through each observation from the series.
Dividing by 10 and making this an integer gives the first digit of the minutes, based on which you can add indicator columns.
I also included how to extract the day indicator column with a lambda expression
for you to compare. It achieves the same as your np.where()
.
import pandas as pd
from datetime import datetime
# make dataframe
df = pd.DataFrame({
'started_at': ['14:20:56',
'00:13:24',
'16:01:33']
})
# convert column to datetime
df['started_at'] = pd.to_datetime(df['started_at'])
# make day indicator column
df['day'] = df['started_at'].apply(lambda ts: 1 if ts.hour > 12 else 0)
# make indicator column for every ten minutes
for i in range(24):
for j in range(6):
col = 'hour_' str(i) '_min_' str(j) '0'
df[col] = df['started_at'].apply(lambda ts: 1 if int(ts.minute/10) == j and ts.hour == i else 0)
print(df)
Output first columns:
started_at day hour_0_min_00 hour_0_min_10 hour_0_min_20
0 2021-11-21 14:20:56 1 0 0 0
1 2021-11-21 00:13:24 0 0 1 0
2 2021-11-21 16:01:33 1 0 0 0
...
...
...