Extracting the hour:time value from Pandas Timestamp and leaving in that format for plotting-CodePudding

I am attempting to create a probability density function of times from an array of Timestamp tuples. I do not care about the date aspect of the timestamp and I would like to use only the hour and minute fields of the timestamp. I do not mind switching to R or Julia if need be since the time data types in Python seem to be restricting this. I attempted setting all the dates to 00:00 but that did not work. In the end I want a pdf of the first tuple values and then a pdf of the difference between each tuple 2nd and 1st value. Can someone please give direction or a solution?

Snapshot of array

CodePudding user response：

If i understood your question correctly, you're looking for time structure instead of datetime.

Considering pandas, you can modify or create a new column with dt.time.

Reproducible Example:

import pandas as pd
df = pd.DataFrame({'foo': pd.date_range("2018-01-01", periods=5, freq="H")})

df.assign(bar = df['foo'].dt.time)
print(df)

CodePudding user response：

import pandas as pd
import array
from pandas import Timestamp
from datetime import datetime as dt

test = ([[Timestamp('2010-01-01 11:30:00'), Timestamp('2010-01-01 13:30:00')], 
[Timestamp('2010-01-02 11:30:00'), Timestamp('2010-01-02 12:10:00')], 
[Timestamp('2010-01-05 16:40:00'), Timestamp('2010-01-05 18:10:00')],
[Timestamp('2010-12-30 14:30:00'), Timestamp('2010-12-30 15:20:00')], 
[Timestamp('2010-12-31 01:40:00'), Timestamp('2010-12-31 02:40:00')], 
[Timestamp('2010-12-31 14:40:00'), Timestamp('2010-12-31 15:40:00')]]) 

# assuming the date is always the same just subtract the two dates
pd.Series([ v2 - v1 for v1, v2 in test])

0   0 days 02:00:00
1   0 days 00:40:00
2   0 days 01:30:00
3   0 days 00:50:00
4   0 days 01:00:00
5   0 days 01:00:00
dtype: timedelta64[ns]

# if we only care about the time difference and wish to ignore date then set date to common value 1900-01-01
v1 = pd.Series([ dt.strptime('1900-01-01 '   str(pd.Timestamp.time(v1)), '%Y-%m-%d %H:%M:%S') for v1, v2 in test])
v2 = pd.Series([ dt.strptime('1900-01-01 '   str(pd.Timestamp.time(v2)), '%Y-%m-%d %H:%M:%S') for v1, v2 in test])
# now subtract first timestamp in tuple from first to find elapsed time
v2 - v1

0   0 days 02:00:00
1   0 days 00:40:00
2   0 days 01:30:00
3   0 days 00:50:00
4   0 days 01:00:00
5   0 days 01:00:00
dtype: timedelta64[ns]

With the sample data you provided, the outcome is the same whether or not you use a common baseline date