Pandas: Splitting datetime into weekday, month, hour columns-CodePudding

I have a dataset with date-time values like this,

           datetime
0          2012-04-01 07:00:00 
.          .
.          .

I would like to create separate columns of weekday, hour, month like,

           datetime             weekday_1 ... weekday_7  hour_1 ... hour_7 ... hour_24  month_1 ... month_4 ... month_12 
0          2012-04-01 07:00:00     0             1         0         1            0       0             1           0

(taking monday as weekday_1, the example date is sunday: weekday_7)

The only way I know how to extract from datetime is this,

df['month'] = df['datetime'].dt.month

But I cannot seem to apply this to fit my problem.

Sorry if this sounds repetitive, i am fairly new to this. But similar question answers were not helpful enough. Thanks in advance.

CodePudding user response：

Create a custom function:

# Use {i:02} to get a number on two digits
cols = [f'weeday_{i}' for i in range(1, 8)] \
         [f'hour_{i}' for i in range(1, 25)] \
         [f'month_{i}' for i in range(1, 13)]

def get_dummy(dt):
    l = [0] * (7 24 12)
    l[dt.weekday()] = 1
    l[dt.hour   6] = 1
    l[dt.month   30] = 1
    return pd.Series(dict(zip(cols, l)))
    
df = df.join(df['datetime'].apply(get_dummy))

Output:

>>> df.iloc[0]
datetime    2012-04-01 07:00:00
weeday_1                      0
weeday_2                      0
weeday_3                      0
weeday_4                      0
weeday_5                      0
weeday_6                      0
weeday_7                      1  # <- Sunday
hour_1                        0
hour_2                        0
hour_3                        0
hour_4                        0
hour_5                        0
hour_6                        0
hour_7                        1 # <- 07:00
hour_8                        0
hour_9                        0
hour_10                       0
hour_11                       0
hour_12                       0
hour_13                       0
hour_14                       0
hour_15                       0
hour_16                       0
hour_17                       0
hour_18                       0
hour_19                       0
hour_20                       0
hour_21                       0
hour_22                       0
hour_23                       0
hour_24                       0
month_1                       0
month_2                       0
month_3                       0
month_4                       1  # <- April
month_5                       0
month_6                       0
month_7                       0
month_8                       0
month_9                       0
month_10                      0
month_11                      0
month_12                      0
Name: 0, dtype: object

CodePudding user response：

You can create columns for the weekday, hours, month and then getdummy for them. Below is a link to the individual syntax. [https://www.w3schools.com/python/python_datetime.asp][1]

Below is my sample code with regards to your question

#Assume df is your DataFrame for datetime

df[["weekday","hour","month"]]=df[[datetime.strftime("%Y"),datetime.strftime("%H"),datetime.strftime("%m")]]
df=pd.get_dummies(df[["weekday","hour","month"]])

CodePudding user response：

You can use:

df = pd.DataFrame(data={'datetime':[datetime(2012,4,1,7,0,0),
                                    datetime(2012,12,1,8,0,0)]})

df['datetime'] = pd.to_datetime(df['datetime'])
df['month'] = df['datetime'].dt.month
df['weekday'] = df['datetime'].dt.dayofweek
df['hour'] = df['datetime'].dt.hour

for column in ['month', 'weekday', 'hour']:
    index = [col for col in df.columns if col!=column]
    df = df.pivot_table(index=index, columns=[column], aggfunc=np.count_nonzero).fillna(0).astype(bool).add_prefix(f'{column}_').reset_index()

#print(df)
#Here is the output as of now
# hour            datetime  month_4  month_12  weekday_5  weekday_6  hour_7  hour_8
# 0    2012-04-01 07:00:00     True     False      False       True    True   False
# 1    2012-12-01 08:00:00    False      True       True      False   False    True

other_cols = [f'weekday_{i}' for i in range(1, 8)]   [f'hour_{i}' for i in range(1, 25)]   [f'month_{i}' for i in range(1, 13)]
df_base = pd.DataFrame(columns= ['datetime']   other_cols)
df_base = pd.concat([df_base, df]).fillna(0)
df_base[df_base.columns[1:]] = df_base[df_base.columns[1:]].fillna(0).astype(int)
print(df_base)

OUTPUT

             datetime  weekday_1  weekday_2  weekday_3  weekday_4  weekday_5  weekday_6  weekday_7  hour_1  hour_2  ...  month_3  month_4  month_5  month_6  month_7  month_8  month_9  month_10  month_11  month_12
0 2012-04-01 07:00:00          0          0          0          0          0          1          0       0       0  ...        0        1        0        0        0        0        0         0         0         0
1 2012-12-01 08:00:00          0          0          0          0          1          0          0       0       0  ...        0        0        0        0        0        0        0         0         0         1