I have a data frame with close to 2.5M rows. The structure of the data frame is as follows:
X | Y |
---|---|
3256772 | 54745 |
3256778 | 54779 |
I have to apply a PyProj function such that the following result is obtained:
X | Y | X2 | Y2 |
---|---|---|---|
3256772 | 54745 | 23.45 | -49.23 |
3256778 | 54779 | 23.50 | -51.24 |
Is there anyway to optimize this piece of code? The data frame i'm working on has close to 2.5 million rows thus the optimization matters.
I have written the following code for applying the function but it is taking forever to process the results.
from pyproj import Proj, transform
def convert(x1,y1):
inProj = Proj('epsg:3857')
outProj = Proj('epsg:4326')
x2,y2 = transform(inProj,outProj,x1,y1,always_xy=True)
return(x2,y2)
final[['X2', 'Y2']] = final.apply(lambda row: pd.Series(convert(row['X'], row['Y'])), axis=1)
CodePudding user response:
Based on the suggestions in the comments, I passed the x1 and y1 input values as numpy arrays and got the results. The code executed in 4.1s which works for me.
For future reference for anyone looking, here's the code I used:
final['X2'],final['Y2']=transform(input_epsg,output_epsg,final[["X"]].to_numpy(),final[["Y"]].to_numpy(),always_xy=True)