Time difference in minutes from as a datetime column as a result of values from another pandas colum-CodePudding

I have a pandas dataframe as :

datetime               fault
2021-08-06T09:04:00    No
2021-08-06T09:05:00    No
2021-08-06T09:06:00    Yes
2021-08-06T09:07:00    Yes
2021-08-06T09:08:00    No

I would like to obtain the duration of time in minutes when the fault column was YES.

So the output should be 2 minutes which is (2021-08-06T09:06:00 2021-08-06T09:07:00)

CodePudding user response：

First you need to convert your datetime column to datetime objects.

df["datetime"] = pd.to_datetime(df["datetime"], format="%Y-%m-%dT%H:%M:%S")

Now we should find indices with "Yes" in fault column:

indices = df.index[df['fault'] == "Yes"].tolist()

Now we can iterate over indices to find the column with "Yes" and the previous one:

Note: I am not going to check errors. However when index is 0 previous row does not exist. And you'll get error. Or get the last row. I'm not sure.

Now we get each index and the previous one. Get the time diff. And calculate the value in minutes:

for index in indices:
    prev = df.iloc[index - 1]
    current = df.iloc[index]
    print(pd.Timedelta(current["datetime"] - prev["datetime"]).seconds/60)

Maybe you can get rid of prev and current and write them directly on the calculation line:

for index in indices:
    print(pd.Timedelta(df.iloc[index]["datetime"] - df.iloc[index - 1]["datetime"]).seconds/60)

Since we calculated the values we can add them using list comprehension:

total_diffs = sum(
    [
        pd.Timedelta(df.iloc[index]["datetime"] - df.iloc[index - 1]["datetime"]).seconds/60
        for index in indices
    ]
)

Finally the whole code would be:

df["datetime"] = pd.to_datetime(df["datetime"], format="%Y-%m-%dT%H:%M:%S")
indices = df.index[df['fault'] == "Yes"].tolist()


total_diffs = sum(
    [
        pd.Timedelta(df.iloc[index]["datetime"] - df.iloc[index - 1]["datetime"]).seconds/60
        for index in indices
    ]
)