Home > Mobile >  How to determine cycles with Pandas
How to determine cycles with Pandas

Time:10-08

Based on sample dataframe:

import pandas as pd
Machine = [0,0,0,0,0,0,1,1,1,1,1,0,1,1,1,0,0,0,0,0,0,0,1,1,1,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0]
df2 = pd.DataFrame(Machine)

This is mocking a machine being on and off. 0 means that it is off, and 1 means that it is on over that period of time. However, due to poor data the machine will say its off in the middle of an on cycle seen in the data. (1,1,1,1,1,0,1,1,1) The machine is really on during this whole period and the 0 is an error. Does anyone know of a easy way to calculate the total number of on cycles this would have while ignoring instances of bad data?

The sample code above has 3 on cycles and 4 off cycles. What would be the best way to calculate this while ignoring random data errors in a on cycle.

CodePudding user response:

It's maybe not the best way, but you can build something like:

df['error_n-2'] = df['Machine'].eq(df.col1.shift(-2))
df['error_n-1'] = df['Machine'].eq(df.col1.shift(-1))
df['error_n 1'] = df['Machine'].eq(df.col1.shift(1))
df['error_n 2'] = df['Machine'].eq(df.col1.shift(2))


df['nb_diff'] =df['error_n-2'] df['error_n-1'] df['error_n 1'] df['error_n 2']

#apply a rule of 3 on 4
df['potential_error']=np.where(df['nb_diff']>=3,True,False)

#clean
df.drop(columns=['error_n-2','error_n-1','error_n 1','error_n 2','nb_diff'], inplace=True) 

#exclude potential error
df[df['potential_error']==False]

CodePudding user response:

EDIT: This answer feels a bit better, still using the same regex approach:

import re

patt = re.compile(r'000 ')
off_states = patt.findall(''.join([str(i) for i in machine])

for s in off_states:
     print(s)

Output:

> 000000
> 0000000
> 00000000
> 0000000

We can then split the machine on these off states, and count the number of resulting 'on' states:

on_states = patt.split(''.join([str(i) for i in machine])

Output:

> ['', '111110111', '1110011111111', '111011111111', '']

state_changes = len(matches)   len([i for in in on_states if i != '']))

This gives the desired total states of 7, and we can subtract 1 if needed to get the number of changes.


Original:

Ok, so this feels a little hack-ish, but in my testing on your sample data it works.

We use regex to find the occurrences of more than 2 sequential zeros (i.e. 'off' state), then divide that into the length of the overall list to find the number of state changes.

import re
pattern = re.compile(r'[0]{3}')
machine = [str(i) for i in machine]
matches = len(patt.findall(''.join(machine))
print(len(machine) // matches)

This gives me 7, which by my count is the correct number of state changes. If someone could help me with the reasoning of why this works, that would be great...it makes sense to me intuitively but I can't put it into words.

  • Related