Home > front end >  for loop with if statement runs slow
for loop with if statement runs slow

Time:03-22

I have a long data frame with approx 150k rows and two columns (A and B). I'm writing a code that in column "B" for a row "i", if the next 6 rows in column A == 1 then, B=1.

So far I managed to write a working code (see below). However, it is very slow to run, about 5min. Do you have any suggestions about how to make the code more efficient? Thank you!

df[j]['B'] = 0

for i in np.arange(0,len(df[j]['A'])-1, dtype=np.float64):
    try:
        if (df[j]['A'][i 1] == 1) & (df[j]['A'][i 2]==1) & (df[j]['A'][i 3]==1) & (df[j]['A'][i 4]==1) & (df[j]['A'][i 5]==1) & (df[j]['A'][i 6]==1) == True:
            df[j]['B'][i] = 1
    except:
        pass

I've tried to remove the try/except block but still no improvement, actually it is 1 second slower.

for i  in np.arange(0,len(WT_df[j]['ID10'])-7, dtype=np.float64):
    if (WT_df[j]['ID10'][i 1] == 1) & (WT_df[j]['ID10'][i 2]==1) & (WT_df[j]['ID10'][i 3]==1) & (WT_df[j]['ID10'][i 4]==1) & (WT_df[j]['ID10'][i 5]==1) & (WT_df[j]['ID10'][i 6]==1) == True:
        WT_df[j]['ID_on'][i] = 1

Best regards, Luis

CodePudding user response:

Pandas direct indexing is slow (at least in loops). You can vectorize the loop to speed it up. Here is an untested example:

column = WT_df[j]['ID10'].values
size = column.size-7
result = np.ones(size, dtype=bool)

for k in range(1, 7):
    result &= column[k:size k] == 1

WT_df[j].loc[np.where(result)[0], 'ID_on'] = 1
  • Related