I am attempting to create a probability density function of times from an array of Timestamp tuples. I do not care about the date aspect of the timestamp and I would like to use only the hour and minute fields of the timestamp. I do not mind switching to R or Julia if need be since the time data types in Python seem to be restricting this. I attempted setting all the dates to 00:00 but that did not work. In the end I want a pdf of the first tuple values and then a pdf of the difference between each tuple 2nd and 1st value. Can someone please give direction or a solution?
CodePudding user response:
If i understood your question correctly, you're looking for time
structure instead of datetime
.
Considering pandas
, you can modify or create a new column with dt.time
.
Reproducible Example:
import pandas as pd
df = pd.DataFrame({'foo': pd.date_range("2018-01-01", periods=5, freq="H")})
df.assign(bar = df['foo'].dt.time)
print(df)
CodePudding user response:
import pandas as pd
import array
from pandas import Timestamp
from datetime import datetime as dt
test = ([[Timestamp('2010-01-01 11:30:00'), Timestamp('2010-01-01 13:30:00')],
[Timestamp('2010-01-02 11:30:00'), Timestamp('2010-01-02 12:10:00')],
[Timestamp('2010-01-05 16:40:00'), Timestamp('2010-01-05 18:10:00')],
[Timestamp('2010-12-30 14:30:00'), Timestamp('2010-12-30 15:20:00')],
[Timestamp('2010-12-31 01:40:00'), Timestamp('2010-12-31 02:40:00')],
[Timestamp('2010-12-31 14:40:00'), Timestamp('2010-12-31 15:40:00')]])
# assuming the date is always the same just subtract the two dates
pd.Series([ v2 - v1 for v1, v2 in test])
0 0 days 02:00:00
1 0 days 00:40:00
2 0 days 01:30:00
3 0 days 00:50:00
4 0 days 01:00:00
5 0 days 01:00:00
dtype: timedelta64[ns]
# if we only care about the time difference and wish to ignore date then set date to common value 1900-01-01
v1 = pd.Series([ dt.strptime('1900-01-01 ' str(pd.Timestamp.time(v1)), '%Y-%m-%d %H:%M:%S') for v1, v2 in test])
v2 = pd.Series([ dt.strptime('1900-01-01 ' str(pd.Timestamp.time(v2)), '%Y-%m-%d %H:%M:%S') for v1, v2 in test])
# now subtract first timestamp in tuple from first to find elapsed time
v2 - v1
0 0 days 02:00:00
1 0 days 00:40:00
2 0 days 01:30:00
3 0 days 00:50:00
4 0 days 01:00:00
5 0 days 01:00:00
dtype: timedelta64[ns]
With the sample data you provided, the outcome is the same whether or not you use a common baseline date