I have a pandas dataframe as :
datetime fault
2021-08-06T09:04:00 No
2021-08-06T09:05:00 No
2021-08-06T09:06:00 Yes
2021-08-06T09:07:00 Yes
2021-08-06T09:08:00 No
I would like to obtain the duration of time in minutes when the fault column was YES.
So the output should be 2 minutes which is (2021-08-06T09:06:00 2021-08-06T09:07:00)
CodePudding user response:
First you need to convert your datetime
column to datetime objects.
df["datetime"] = pd.to_datetime(df["datetime"], format="%Y-%m-%dT%H:%M:%S")
Now we should find indices with "Yes"
in fault
column:
indices = df.index[df['fault'] == "Yes"].tolist()
Now we can iterate over indices
to find the column with "Yes"
and the previous one:
Note: I am not going to check errors. However when index is 0
previous row does not exist. And you'll get error. Or get the last row. I'm not sure.
Now we get each index and the previous one. Get the time diff. And calculate the value in minutes:
for index in indices:
prev = df.iloc[index - 1]
current = df.iloc[index]
print(pd.Timedelta(current["datetime"] - prev["datetime"]).seconds/60)
Maybe you can get rid of prev
and current
and write them directly on the calculation line:
for index in indices:
print(pd.Timedelta(df.iloc[index]["datetime"] - df.iloc[index - 1]["datetime"]).seconds/60)
Since we calculated the values we can add them using list comprehension:
total_diffs = sum(
[
pd.Timedelta(df.iloc[index]["datetime"] - df.iloc[index - 1]["datetime"]).seconds/60
for index in indices
]
)
Finally the whole code would be:
df["datetime"] = pd.to_datetime(df["datetime"], format="%Y-%m-%dT%H:%M:%S")
indices = df.index[df['fault'] == "Yes"].tolist()
total_diffs = sum(
[
pd.Timedelta(df.iloc[index]["datetime"] - df.iloc[index - 1]["datetime"]).seconds/60
for index in indices
]
)