Python: How to compare values of a row with a threshold to determine cycles-CodePudding

I have the following code I made that gets data from a machine in CSV format:

import pandas as pd
import numpy as np

header_list = ['Time']
df = pd.read_csv('S8-1.csv' , skiprows=6 , names = header_list)

#splits the data into proper columns
df[['Date/Time','Pressure']] = df.Time.str.split(",,", expand=True)

#deletes orginal messy column
df.pop('Time')

#convert Pressure from object to numeric
df['Pressure'] = pd.to_numeric(df['Pressure'], errors = 'coerce')

#converts to a time
df['Date/Time'] = pd.to_datetime(df['Date/Time'], format = '%m/%d/%y %H:%M:%S.%f' , errors = 'coerce')

#calculates rolling and rolling center of pressure values
df['Moving Average'] = df['Pressure'].rolling(window=5).mean()
df['Rolling Average Center']= df['Pressure'].rolling(window=5, center=True).mean()

#sets threshold for machine being on or off, if rolling center average is greater than 115 psi, machine is considered on
df['Machine On/Off'] = ['1' if x >= 115 else '0' for x in df['Rolling Average Center'] ]

df

The following DF is created:

Throughout the rows in column "Machine On/Off" there will be values of 1 or 0 based on the threshold i set. I need to write a code that will go through these rows and indicate if a cycle has started. The problem is due to the data being slightly off, during a "on" cycle, there will be around 20 rows saying (1) with a couple of rows saying 0 due to poor data recieved.

I need to have a code that compares the values through the data in order to determine the amount of cycles the machine is on or off for. I was thinking that setting a threshold of around would work, so that if the value is (1) for more than 6 rows then it will indicate a cycle and ignore the incorrect 0's that are scattered throughout the column.

What would be the best way program this so I can get a total count of cycles the machine is on or off for throughout the 20,000 rows of data I have.

Edit: Here is a example Df that is similar, in this example we can see there are 3 cycles of the machine (1 values) and mixed into the on cycles is 0 values (bad data). I need a code that will count the total number of cycles and ignore the bad data that may be in the middle of a 'on cycle'.

import pandas as pd
Machine = [0,0,0,0,0,0,1,1,1,1,1,0,1,1,1,0,0,0,0,0,0,0,1,1,1,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0]
df2 = pd.DataFrame(Machine)

CodePudding user response：

You can create groups of consecutive rows of on/off using cumsum:

machine = [0,0,0,0,0,0,1,1,1,1,1,0,1,1,1,0,0,0,0,0,0,0,1,1,1,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0]

df = pd.DataFrame(machine, columns=['Machine On/Off'])

df['group'] = df['Machine On/Off'].ne(df['Machine On/Off'].shift()).cumsum()

df['group_size'] = df.groupby('group')['group'].transform('size')

# Output

    Machine On/Off  group  group_size
0                0      1           6
1                0      1           6
2                0      1           6
3                0      1           6
4                0      1           6
5                0      1           6
6                1      2           5
7                1      2           5
8                1      2           5
9                1      2           5
10               1      2           5

I'm not sure I got your intention on how you would like to filter/alter the values, but probably this can serve as a guide:

threshold = 6

# Replace 0 for 1 if group_size < threshold. This will make the groupings invalid.
df.loc[(df['Machine On/Off'].eq(0)) & (df.group_size.lt(threshold)), 'Machine On/Off'] = 1

# Output df['Machine On/Off'].values

array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0], dtype=int64)