I have a dataset with date-time values like this,
datetime
0 2012-04-01 07:00:00
. .
. .
I would like to create separate columns of weekday, hour, month like,
datetime weekday_1 ... weekday_7 hour_1 ... hour_7 ... hour_24 month_1 ... month_4 ... month_12
0 2012-04-01 07:00:00 0 1 0 1 0 0 1 0
(taking monday as weekday_1, the example date is sunday: weekday_7)
The only way I know how to extract from datetime is this,
df['month'] = df['datetime'].dt.month
But I cannot seem to apply this to fit my problem.
Sorry if this sounds repetitive, i am fairly new to this. But similar question answers were not helpful enough. Thanks in advance.
CodePudding user response:
Create a custom function:
# Use {i:02} to get a number on two digits
cols = [f'weeday_{i}' for i in range(1, 8)] \
[f'hour_{i}' for i in range(1, 25)] \
[f'month_{i}' for i in range(1, 13)]
def get_dummy(dt):
l = [0] * (7 24 12)
l[dt.weekday()] = 1
l[dt.hour 6] = 1
l[dt.month 30] = 1
return pd.Series(dict(zip(cols, l)))
df = df.join(df['datetime'].apply(get_dummy))
Output:
>>> df.iloc[0]
datetime 2012-04-01 07:00:00
weeday_1 0
weeday_2 0
weeday_3 0
weeday_4 0
weeday_5 0
weeday_6 0
weeday_7 1 # <- Sunday
hour_1 0
hour_2 0
hour_3 0
hour_4 0
hour_5 0
hour_6 0
hour_7 1 # <- 07:00
hour_8 0
hour_9 0
hour_10 0
hour_11 0
hour_12 0
hour_13 0
hour_14 0
hour_15 0
hour_16 0
hour_17 0
hour_18 0
hour_19 0
hour_20 0
hour_21 0
hour_22 0
hour_23 0
hour_24 0
month_1 0
month_2 0
month_3 0
month_4 1 # <- April
month_5 0
month_6 0
month_7 0
month_8 0
month_9 0
month_10 0
month_11 0
month_12 0
Name: 0, dtype: object
CodePudding user response:
You can create columns for the weekday, hours, month and then getdummy for them. Below is a link to the individual syntax. [https://www.w3schools.com/python/python_datetime.asp][1]
Below is my sample code with regards to your question
#Assume df is your DataFrame for datetime
df[["weekday","hour","month"]]=df[[datetime.strftime("%Y"),datetime.strftime("%H"),datetime.strftime("%m")]]
df=pd.get_dummies(df[["weekday","hour","month"]])
CodePudding user response:
You can use:
df = pd.DataFrame(data={'datetime':[datetime(2012,4,1,7,0,0),
datetime(2012,12,1,8,0,0)]})
df['datetime'] = pd.to_datetime(df['datetime'])
df['month'] = df['datetime'].dt.month
df['weekday'] = df['datetime'].dt.dayofweek
df['hour'] = df['datetime'].dt.hour
for column in ['month', 'weekday', 'hour']:
index = [col for col in df.columns if col!=column]
df = df.pivot_table(index=index, columns=[column], aggfunc=np.count_nonzero).fillna(0).astype(bool).add_prefix(f'{column}_').reset_index()
#print(df)
#Here is the output as of now
# hour datetime month_4 month_12 weekday_5 weekday_6 hour_7 hour_8
# 0 2012-04-01 07:00:00 True False False True True False
# 1 2012-12-01 08:00:00 False True True False False True
other_cols = [f'weekday_{i}' for i in range(1, 8)] [f'hour_{i}' for i in range(1, 25)] [f'month_{i}' for i in range(1, 13)]
df_base = pd.DataFrame(columns= ['datetime'] other_cols)
df_base = pd.concat([df_base, df]).fillna(0)
df_base[df_base.columns[1:]] = df_base[df_base.columns[1:]].fillna(0).astype(int)
print(df_base)
OUTPUT
datetime weekday_1 weekday_2 weekday_3 weekday_4 weekday_5 weekday_6 weekday_7 hour_1 hour_2 ... month_3 month_4 month_5 month_6 month_7 month_8 month_9 month_10 month_11 month_12
0 2012-04-01 07:00:00 0 0 0 0 0 1 0 0 0 ... 0 1 0 0 0 0 0 0 0 0
1 2012-12-01 08:00:00 0 0 0 0 1 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1