I am wondering is there is a way where I can accelerate a double for loop in python, currently this is my code:
for i in range (len(newdata1)):
for j in range(len(dataset)):
if str(dataset['date'].values[j]) == str(newdata1['SALEDATE'].values[i]):
newdata1['QUANTITY'].values[i],newdata1['PRICEWONKG'].values[i] = dataset['apple(kg)'].values[j],dataset['apple($/kg)'].values[j]
The code is working correctly but is taking a lot of time since the dataframes size is really big. Is there any way I can reduce the execution time for this double loop?
Thanks
CodePudding user response:
I am not sure if it works well for your use case. But you could try to use the functools library to cache your code.
CodePudding user response:
You can create a filter for the matching dates, merge on dates, and assign the values from the merged dataframe to the relevant rows in newdata1
:
mask = newdata1['SALEDATE'].isin(dataset['date'])
newdata1.loc[mask, ['QUANTITY','PRICEWONKG','SALEDATE']] = dataset.merge(newdata1.loc[mask,'SALEDATE'], right_on='SALEDATE', left_on='date').drop('date', axis=1).values