Home > front end >  Pandas : Dataframe Output System Down Time
Pandas : Dataframe Output System Down Time

Time:06-22

I am a beginner of Python. These readings are extracted from sensors which report to system in every 20 mins interval. Now, I would like to find out the total downtime from the start time until end time recovered.

Original Data:

date,           Quality Sensor Reading
1/1/2022 9:00   0
1/1/2022 9:20   0
1/1/2022 9:40   0
1/1/2022 10:00  0
1/1/2022 10:20  0
1/1/2022 10:40  0
1/1/2022 12:40  0
1/1/2022 13:00  0
1/1/2022 13:20  0
1/3/2022 1:20   0
1/3/2022 1:40   0
1/3/2022 2:00   0
1/4/2022 14:40  0
1/4/2022 15:00  0
1/4/2022 15:20  0
1/4/2022 17:20  0
1/4/2022 17:40  0
1/4/2022 18:00  0
1/4/2022 18:20  0
1/4/2022 18:40  0

The expected output are as below:

Quality Sensor = 0

Start_Time             End_Time              Total_Down_Time

2022-01-01 09:00:00    2022-01-01 10:40:00     100 minutes 
   
2022-01-01 12:40:00    2022-01-01 13:20:00      40 minutes 

2022-01-03 01:20:00    2022-01-03 02:00:00      40 minutes 

2022-01-04 14:40:00    2022-01-04 15:20:00      40 minutes

2022-01-04 17:20:00    2022-01-04 18:40:00      80 minutes

CodePudding user response:

Let's say the dates are listed in the dataframe df under column date. You can use shift() to create a second column with the subsequent date/time, then create a third that has your duration by subtracting them. Something like:

df['date2'] = df['date'].shift(-1)
df['difference'] = df['date2'] - df['date']

You'll obviously have one row at the end that doesn't have a following value, and therefore doesn't have a difference.

CodePudding user response:

First, let's break them into groups:

df.loc[df.date.diff().gt('00:20:00'), 'group'] = 1
df.group = df.group.cumsum().ffill().fillna(0)

Then, we can extract what we want from each group, and rename:

df2 = df.groupby('group')['date'].agg(['min', 'max']).reset_index(drop=True)
df2.columns = ['start_time', 'end_time']

Finally, we'll add the interval column and format it to minutes:

df2['down_time'] = df2.end_time.sub(df2.start_time)
# Optional, I wouldn't do this here:
df2.down_time = df2.down_time.dt.seconds/60

Output:

           start_time            end_time  down_time
0 2022-01-01 09:00:00 2022-01-01 10:40:00      100.0
1 2022-01-01 12:40:00 2022-01-01 13:20:00       40.0
2 2022-01-03 01:20:00 2022-01-03 02:00:00       40.0
3 2022-01-04 14:40:00 2022-01-04 15:20:00       40.0
4 2022-01-04 17:20:00 2022-01-04 18:40:00       80.0
  • Related