I currently have a loop in a script that is designed to process a raw test data file, and perform a bunch of calculations during the sanitised data. During the script, I need to figure out exactly how many cycles there are in each test. A cycle can be defined by when the step
value contained a position i
is greater than the next step
, i 1
. For example, the step count would reach, 4, and the next step is 1, so the next step is the beginning of a new cycle. So far I am calculating this with this simple loop:
raw_data = {'Step':[1,1,2,2,2,3,3,4,4,4,1,2,2,3,3,3,4,4,4,4,1,2,2,3,3,4,4,4]}
cycle_test = 1
for i in range(len(raw_data)-1):
if raw_data['Step'][i] > raw_data['Step'][i 1]:
raw_data['CyclesTest'][i] = cycle_test
cycle_test =1
else:
raw_data['CyclesTest'][i] = cycle_test
This works fine, but the raw_data
being provided is very large, and my script is taking forever on this calculation. I've used rolling
before to do max
and min
comparisons before, but is it possible to use that to replace this for loop? I'm just getting back into programming, so every day is a school day again! Any help would be greatly appreciated.
CodePudding user response:
You can do it like this:
import pandas as pd
raw_data = {'Step':[1,1,2,2,2,3,3,4,4,4,1,2,2,3,3,3,4,4,4,4,1,2,2,3,3,4,4,4]}
df = pd.DataFrame(raw_data)
df['CycleTest'] = (df['Step'].diff() < 0).cumsum() 1
print(df)
Step CycleTest
0 1 1
1 1 1
2 2 1
3 2 1
4 2 1
5 3 1
6 3 1
7 4 1
8 4 1
9 4 1
10 1 2
11 2 2
12 2 2
13 3 2
14 3 2
15 3 2
16 4 2
17 4 2
18 4 2
19 4 2
20 1 3
21 2 3
22 2 3
23 3 3
24 3 3
25 4 3
26 4 3
27 4 3
Check when the value gets smaller with diff
and use cumsum
to cumulatively count those occurrences.