Home > Enterprise >  python code speed up nested for loops with two dataframes
python code speed up nested for loops with two dataframes

Time:12-31

ServicePop has x, y coordinate and I want to add a square number(gid). I made a nested for loop to assign a square number but ServicePop is so huge then it takes several hours. Is there a faster and efficient way to do it? When I search at Google they say using apply of dataframe or vectorization will help but I could not alter my code to use such an improvement.
I need your help, please.

import pandas
import datetime
TotPopCenter = pandas.read_csv('TotalPopulationCurrentCenterShapeCoordinate_UTF8.csv', encoding='euckr')
ServicePop = pandas.read_csv('202101_Final.csv', encoding='euckr')
ServicePop.insert(9,'gid','')
Service_gid = ['' for _ in range(len(ServicePop))]
for j in range(len(ServicePop)):
    for i in range(len(TotPopCenter)):
        if (ServicePop['X_COORD'][j] >= TotPopCenter['xcoord'][i]-125) and \
           (ServicePop['X_COORD'][j] < TotPopCenter['xcoord'][i] 125) and \
           (ServicePop['Y_COORD'][j] >= TotPopCenter['ycoord'][i]-125) and \
           (ServicePop['Y_COORD'][j] < TotPopCenter['ycoord'][i] 125):
           Service_gid[j] = TotPopCenter['gid'][I]
ServicePop['gid'] = Service_gid

TotPopCenter gid lbl val xcoord ycoord 0 LM87ab60ba NaN NaN 1087375 1760625 ServicePop STD_YMD X_COORD Y_COORD HCODE WKDY_CD TIME HPOP WPOP VPOP 0 2021-01-01 1.087484e 06 1.760579e 06 2207061 FRI 0 27.97 0.82 7.24

CodePudding user response:

If you're looking to optimize the nested loop specifically, you might want to use itertools.product, using:

import itertools
for j, i in itertools.product(range(len(ServicePop)), range(len(TotPopCenter))):

rather than:

for j in range(len(ServicePop)):
    for i in range(len(TotPopCenter)):

CodePudding user response:

I would store the values instead of constant lookups

for j in range(len(ServicePop)):

    serviceX = ServicePop['X_COORD'][j]
    serviceY = ServicePop['Y_COORD'][j]

    for i in range(len(TotPopCenter)):
    
        totX = TotPopCenter['xcoord'][i]
        totY = TotPopCenter['ycoord'][i]
    
        if (serviceX >= totX - 125) and \
           (serviceX < totX   125) and \
           (serviceY >= totY - 125) and \
           (serviceY < totY   125):

Maybe you can even break the inner loop early if you know that they wont overlap. Maybe sort TotPopCenter before.

  • Related