Assign dataframe to variable outside for loop or use it directly inside for loop in Python-CodePudding

option 1:

a = np.unique(df.values)
for i in range():
  if df2.loc[i,'col1'] in a:
    df2.loc[i,'col2'] = 'Ok'
  else:
    df2.loc[i,'col2'] = 'No'

option 2:

for i in range():
  if df2.loc[i,'col1'] in np.unique(df.values):
    df2.loc[i,'col2'] = 'Ok'
  else:
    df2.loc[i,'col2'] = 'No'

Which is better in terms of memory and speed in Python?

Edited for clarity on the operation inside the for loop.

CodePudding user response：

Both are inefficient, the second is the worse as you recalculate the unique values at each step.

Use vectorial code instead:

df2['col2'] = df2['col1'].isin(np.unique(df.values)).map({True: 'Ok', False: 'No'})

CodePudding user response：

In terms of memory, option 2 would be probably be better because you aren't making a new variable. In terms of speed, there wouldn't be a difference because they df.values and a refer to the same piece of data. You can see if two variables refer to the same piece of data by using the is keyword: var1 is var2. However, we don't know what you are doing with the data.