Home > Software engineering >  Do loc and iloc methods behave differently in assignment?
Do loc and iloc methods behave differently in assignment?

Time:03-21

I need to add a new column to a dataframe based on an ID in the other one.

I created a small snippet of what I'm trying to do:

import pandas as pd
import numpy as np

a = pd.DataFrame([['ass-123-43', ['123', '456']],['ass-123-44', ['123', '457']]], columns=['customer_id', 'order_id']) 
b = pd.DataFrame([['ass-123-43'], ['ass-123-44']], columns=['customer_id'])

dict_a = a.set_index('customer_id').order_id.to_dict()

b['order_id'] = np.nan

for customer_id, order_id in dict_a.items():
    if customer_id in b.customer_id.values:
        b.iloc[b.customer_id == customer_id, 1] = pd.Series([order_id])

print(b)

When I use the iloc method, the code works as expected:

  customer_id    order_id
0  ass-123-43  [123, 456]
1  ass-123-44  [123, 457]

But when I use loc method it doesn't work as expected:

import pandas as pd
import numpy as np

a = pd.DataFrame([['ass-123-43', ['123', '456']],['ass-123-44', ['123', '457']]], columns=['customer_id', 'order_id'])
b = pd.DataFrame([['ass-123-43'], ['ass-123-44']], columns=['customer_id'])

dict_a = a.set_index('customer_id').order_id.to_dict()

b['order_id'] = np.nan

for customer_id, order_id in dict_a.items():
    if customer_id in b.customer_id.values:
        b.loc[b.customer_id == customer_id, 'order_id'] = pd.Series([order_id])

print(b)

I got this result:

  customer_id    order_id
0  ass-123-43  [123, 456]
1  ass-123-44         NaN

Beyond loc using labels to assign the row and iloc a number, is there something else I missed?

CodePudding user response:

You can read more about the differences between .iloc and .loc here, but for your particular case, the reason you're getting NaN is because of what you're assigning. With .iloc, it completely ignores the index of the value that you're assigning (which is pd.Series([order_id])), so it works fine and doesn't produce NaN.

With .loc, however, it does respect the index. In your example, pd.Series([order_id]) has an index of [0], as you can see:

>>> order_id = '123'
>>> pd.Series([order_id])
0    123
dtype: object

Now look at the index of the row where the NaN is occuring. It's 1. But the index of the value you're trying to assign to it is 0, as shown above. Mismatched indexes! What happens? The missing value - NaN.


If you want use .loc instead of .iloc, you can avoid this mismatched-index problem by converting the Series object to a numpy array (using .to_numpy()) before assigning:

b.loc[b.customer_id == customer_id, 'order_id'] = pd.Series([order_id]).to_numpy()

That will work as expected.

  • Related