I have a large (> 1 mil rows) dataset that has datetime timestamps inside of it. I want to look at trends that may occur throughout the day. So to start if I do: print(df['timestamp'])
it will show my data as:
0 2014-01-01 13:11:50.3
1 2011-02-13 04:12:45.0
Name: timestamp, Length: 1000000, dtype: datetime64[ns] /
However, I do not want the date there, as I only want to plot trends throughout the day, without caring what day it is. So I do this line of code:
df['timestamp'] = pd.Series([val.time() for val in df['timestamp']])
, this gives me the desired only-timestamp data, but returns the dtype as 'object', which I cannot plot. For example when I try using Seaborn: sns.lineplot(df['timestamp'], df['Task_Length'])
, I get the error "TypeError: Invalid object type at position 0".
BUT, if I just do the same exact sns.lineplot(df['timestamp'], df['Task_Length'])
, without the intermediary step of cutting off date, leaving it as datetime64[ns] object as opposed to the generic 'object' datatype; it plots fine. However, this results in a plot spanning multiple years, whereas I only want to see time-of-day trends.
For clarity, this is a pandas dataframe where each row has a task that occurs, which generically I could call one column being 'TaskName', and each is associated with a 'timestamp' as previously explained, and I want to use any sort of Seaborn plotting to analyze daily trends such as different task types happening at different times of the day, not caring about days of the year. Thanks for any help.
Edit* updating another thing that I tried: using original datetime64[ns] object that does plot, I tried doing sns.lineplot(df['timestamp'].dt.time, df['Task_Length'])
which gave the same error as when I add the line of code to cut off date. Can't figure our why Seaborn doesn't like just the time component.
CodePudding user response:
This works for me. Difference is in converting column "timestamp" from datetime to time.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame([['2014-01-01 13:11:50.3',10],['2011-02-13 04:12:45.0',15]], columns=['timestamp','Task_Length'])
df['timestamp'] = pd.to_datetime(df['timestamp']).dt.strftime('%H:%M:%S')
sns.lineplot(df['timestamp'], df['Task_Length'])
plt.show()
Refer this question for further details Plot datetime.time in seaborn