In my project I am trying to create a new column to categorize records by range of hours, let me explain, I have a column in the dataframe called 'TowedTime' with time series data, I want another column to categorize by full hour without minutes, for example if the value in the 'TowedTime' column is 09:32:10 I want it to be categorized as 9 AM, if says 12:45:10 it should be categorized as 12 PM and so on with all the other values. I've read about the .cut and bins function but I can't get the result I want.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
df = pd.read_excel("Baltimore Towing Division.xlsx",sheet_name="TowingData")
df['Month'] = pd.DatetimeIndex(df['TowedDate']).strftime("%b")
df['Week day'] = pd.DatetimeIndex(df['TowedDate']).strftime("%a")
monthOrder = ['Jan', 'Feb', 'Mar', 'Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
dayOrder = ['Mon','Tue','Wed','Thu','Fri','Sat','Sun']
pivotHours = pd.pivot_table(df, values='TowedDate',index='TowedTime',
columns='Week day',
fill_value=0,
aggfunc= 'count',
margins = False, margins_name='Total').reindex(dayOrder,axis=1)
print(pivotHours)
CodePudding user response:
First, make sure the type of the column 'TowedTime' is datetime. Second, you can easily extract the hour from this data type.
df['TowedTime'] = pd.to_datetime(df['TowedTime'],format='%H:%M:%S')
df['hour'] = df['TowedTime'].dt.hour
hope it answers your question
CodePudding user response:
With the help of @Fabien C I was able to solve the problem.
First, I had to check the data type of values in the 'TowedTime' column with dtypes function. I found that were a Object.
I proceed to try convert 'TowedTime' to datetime:
df['TowedTime'] = pd.to_datetime(df['TowedTime'],format='%H:%M:%S').dt.time
Then to create a new column in the df, for only the hours:
df['Hour'] = pd.to_datetime(df['TowedTime'],format='%H:%M:%S').dt.hour
You can notice in the image that 'TowedTime' column remains as an object, but the new 'Hour' column correctly returns the hour value.
Originally, the dataset already had the date and time separated into different columns, I think they used some method to separate date and time in excel and this created the time ('TowedTime') to be an object, I could not convert it, Or at least that's what the dtypes function shows me.
I tried all this Pandas methods for converting the Object to Datetime :
df['TowedTime'] = pd.to_datetime(df['TowedTime'])
df['TowedTime'] = pd.to_datetime(df['TowedTime'])
df['TowedTime'] = df['TowedTime'].astype('datetime64[ns]')
df['TowedTime'] = pd.to_datetime(df['TowedTime'], format='%H:%M:%S')
df['TowedTime'] = pd.to_datetime(df['TowedTime'], format='%H:%M:%S')