Calculate length of a cycle, average etc. in a Dataframe with Datetime tag-CodePudding

I have a Dataframe file that looks like below (imagine so many other rows):

    DateTime               hours   uptime   Start_stop  machine_started     Voltage
2022-02-22  07:50:58;269    18,7    43269      221            0               800
2022-02-22  07:50:59;614    18,7    44609      221            0               800
2022-02-22  07:50:59;614    18,7    44609      221            1               800
2022-02-22  07:51:01;915    18,7    46904      221            1               700
2022-02-22  07:51:01;915    18,7    46904      221            1               500
2022-02-22  08:16:20;343    18,7    48523      221            0               800
2022-02-22  08:16:20;638    18,7    48523      221            0               800
2022-02-23  07:51:01;915    18,7    46904      221            1               800
2022-02-23  07:51:03;543    18,7    48523      221            1               1000
2022-02-23  08:16:20;343    18,7    48523      221            0               800
2022-02-23  08:16:20;638    18,7    48523      221            0               800

I need a logic to say when the machine is working, from the moment the (machine_started==1) start taking the rows until it ends (machine_started==0) and calculate the average of those rows, standard deviation and the length of that cycle.

I would like to have the Datetime tag that shows the process start time. As you can see for this example I have two times that the process starts.

So in the final file I will see a number of rows which corresponds to the number of cycles in that file.

Unfortunately I can just think of doing one task at a time and also after finding the first cycle I get stuck and cannot count the rows or perform these calculation.

So at the I would like to see something like this:

DateTime               Average_Voltage      standard_deviation    length_of_the_cycle  
2022-02-22  07:50:59;614       666                xxx                        YYY
2022-02-23  07:51:01;915       900                xxx                        YYY

How can this be acheived?

CodePudding user response：

First identify the groups that you are interested in and remove all other rows. To get the groups, we can use shift and cumsum. For the filtering, we need to consider whether the first row starts with machine_started 0 or 1. This determines if we want the odd or even numbered groups.

Code for this part:

df['grp'] = df['machine_started'].ne(df['machine_started'].shift(1)).cumsum()
start = df['machine_started'].iloc[0]
df = df.loc[(df['grp']   start) % 2 == 0]

Intermediate result:

                  DateTime hours  uptime  Start_stop  machine_started  Voltage  grp
2  2022-02-22 07:50:59;614  18,7   44609         221                1      800    2
3  2022-02-22 07:51:01;915  18,7   46904         221                1      700    2
4  2022-02-22 07:51:01;915  18,7   46904         221                1      500    2
7  2022-02-23 07:51:01;915  18,7   46904         221                1      800    4
8  2022-02-23 07:51:03;543  18,7   48523         221                1     1000    4

Now, we can groupby and aggregate the information needed. Finally, rename the obtained columns:

df = df.groupby('grp').agg({'DateTime': ['first'], 'Voltage': ['mean', 'std', 'count']}).reset_index(drop=True)
df.columns = ['DateTime', 'Average_Voltage', 'standard_deviation', 'length_of_the_cycle']

Result:

                  DateTime  Average_Voltage  standard_deviation  length_of_the_cycle
0  2022-02-22 07:50:59;614       666.666667          152.752523                    3
1  2022-02-23 07:51:01;915       900.000000          141.421356                    2