Home > front end >  For loop on pandas dataframe causing slow performance - can rolling be used?
For loop on pandas dataframe causing slow performance - can rolling be used?

Time:05-13

I currently have a loop in a script that is designed to process a raw test data file, and perform a bunch of calculations during the sanitised data. During the script, I need to figure out exactly how many cycles there are in each test. A cycle can be defined by when the step value contained a position i is greater than the next step, i 1. For example, the step count would reach, 4, and the next step is 1, so the next step is the beginning of a new cycle. So far I am calculating this with this simple loop:

raw_data = {'Step':[1,1,2,2,2,3,3,4,4,4,1,2,2,3,3,3,4,4,4,4,1,2,2,3,3,4,4,4]}


cycle_test = 1


for i in range(len(raw_data)-1):
    if  raw_data['Step'][i] > raw_data['Step'][i 1]:
        raw_data['CyclesTest'][i] = cycle_test
        cycle_test =1
    else:
        raw_data['CyclesTest'][i] = cycle_test

This works fine, but the raw_data being provided is very large, and my script is taking forever on this calculation. I've used rolling before to do max and min comparisons before, but is it possible to use that to replace this for loop? I'm just getting back into programming, so every day is a school day again! Any help would be greatly appreciated.

CodePudding user response:

You can do it like this:

import pandas as pd

raw_data = {'Step':[1,1,2,2,2,3,3,4,4,4,1,2,2,3,3,3,4,4,4,4,1,2,2,3,3,4,4,4]}

df = pd.DataFrame(raw_data)
df['CycleTest'] = (df['Step'].diff() < 0).cumsum()   1

print(df)
    Step  CycleTest
0      1          1
1      1          1
2      2          1
3      2          1
4      2          1
5      3          1
6      3          1
7      4          1
8      4          1
9      4          1
10     1          2
11     2          2
12     2          2
13     3          2
14     3          2
15     3          2
16     4          2
17     4          2
18     4          2
19     4          2
20     1          3
21     2          3
22     2          3
23     3          3
24     3          3
25     4          3
26     4          3
27     4          3

Check when the value gets smaller with diff and use cumsum to cumulatively count those occurrences.

  • Related