I have a months time series data that I am trying calculate total hours, minutes, seconds in the dataset as well as for a unique Boolean column for when the column is True or a 1. And for some reason I am doing something wrong where the total time calculations don't appear correct. The code (runs) below goes through calculating the time delta between each index time stamp:
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/bbartling/Data/master/hvac_random_fake_data/testdf2_fc5.csv',
index_col='Date',
parse_dates=True)
print(df)
df["timedelta_alldata"] = df.index.to_series().diff()
seconds_alldata = df.timedelta_alldata.sum().seconds
print('SECONDS ALL DATA: ',seconds_alldata)
days_alldata = df.timedelta_alldata.sum().days
print('DAYS ALL DATA: ',days_alldata)
hours_alldata = round(seconds_alldata/3600, 2)
print('HOURS ALL DATA: ',hours_alldata)
minutes_alldata = round((seconds_alldata/60) % 60, 2)
total_hours_calc = days_alldata * 24.0 hours_alldata
print('TOTAL HOURS CALC: ',total_hours_calc)
# fault flag 5 true time delta calc
df["timedelta_fddflag_fc5"] = df.index.to_series(
).diff().where(df["fc5_flag"] == 1)
seconds_fc5_mode = df.timedelta_fddflag_fc5.sum().seconds
print('FALT FLAG TRUE TOTAL SECONDS: ',seconds_fc5_mode)
hours_fc5_mode = round(seconds_fc5_mode/3600, 2)
print('FALT FLAG TRUE TOTAL HOURS: ',hours_fc5_mode)
percent_true_fc5 = round(df.fc5_flag.mean() * 100, 2)
print('PERCENT TIME WHEN FLAG 5 TRUE: ',percent_true_fc5,'%')
percent_false_fc5 = round((100 - percent_true_fc5), 2)
print('PERCENT TIME WHEN FLAG 5 FALSE: ',percent_false_fc5,'%')
returns:
SECONDS ALL DATA: 85500 <--- I think NOT correct
DAYS ALL DATA: 30
HOURS ALL DATA: 23.75 <--- I think NOT correct
TOTAL HOURS CALC: 743.75
FALT FLAG TRUE TOTAL SECONDS: 1800 <--- I think NOT correct
FALT FLAG TRUE TOTAL HOURS: 0.5 <--- I think NOT correct
PERCENT TIME WHEN FLAG 5 TRUE: 74.29 %
PERCENT TIME WHEN FLAG 5 FALSE: 25.71 %
30 days is correct (DAYS ALL DATA: 30
) and the percent of time when a Boolean column (fc5_flag
) is True or False but the total seconds and hours seems way off...? Would anyone have any tips to write this better?
CodePudding user response:
For a pd.Timedelta
object, .seconds
and .days
do not represent the total duration in the respective time units. It's just the way that the object stores a timespan internally. For example, 25 hours is stored as 1 day 3600 seconds
. This is consistent with Python's built-in datetime.timedelta
.
The correct way to convert it to different time units is to divide the object by another Timedelta
object of the desired unit:
delta = df.index.to_series().diff()
total_hours = delta.sum() / pd.Timedelta(hours=1)
hours_fc5_mode = (delta * df["fc5_flag"]).sum() / pd.Timedelta(hours=1)
print(f"{total_hours=}") # 743.75
print(f"{hours_fc5_mode=}") # 552.5