Home > other >  How to discretize a datetime column?
How to discretize a datetime column?

Time:11-24

I have a dataset that contains a column of datetime of a month, and I need to divide it into two blocks (day and night or am\pm) and then discretize the time in each block into 10mins bins. I could add another column of 0 and 1 to show it is am or pm, but I cannot discretize it! Can you please help me with it?

df['started_at'] = pd.to_datetime(df['started_at'])
df['start hour'] = df['started_at'].dt.hour.astype('int')
df['mor/aft'] = np.where(df['start hour'] < 12, 1, 0)
df['started_at']

0          16:05:36
2          06:22:40
3          16:08:10
4          12:28:57
6          15:47:30
             ...   
3084526    15:24:24
3084527    16:33:07
3084532    14:08:12
3084535    09:43:46
3084536    17:02:26

CodePudding user response:

If I understood correctly you are trying to add a column for every interval of ten minutes to indicate if an observation is from that interval of time.

You can use lambda expressions to loop through each observation from the series.

Dividing by 10 and making this an integer gives the first digit of the minutes, based on which you can add indicator columns.

I also included how to extract the day indicator column with a lambda expression for you to compare. It achieves the same as your np.where().

import pandas as pd
from datetime import datetime

# make dataframe
df = pd.DataFrame({
    'started_at': ['14:20:56', 
                   '00:13:24', 
                   '16:01:33']
})

# convert column to datetime
df['started_at'] = pd.to_datetime(df['started_at'])

# make day indicator column
df['day'] = df['started_at'].apply(lambda ts: 1 if ts.hour > 12 else 0)

# make indicator column for every ten minutes
for i in range(24):
    for j in range(6):
        col = 'hour_'   str(i)   '_min_'   str(j)   '0'
        df[col] = df['started_at'].apply(lambda ts: 1 if int(ts.minute/10) == j and ts.hour == i else 0)

print(df)

Output first columns:

           started_at  day  hour_0_min_00  hour_0_min_10  hour_0_min_20  
0 2021-11-21 14:20:56    1              0              0              0   
1 2021-11-21 00:13:24    0              0              1              0   
2 2021-11-21 16:01:33    1              0              0              0   
...
...
...
 
  • Related