Home > Back-end >  Pandas SettingWithCopyWarning on basic np.where statement
Pandas SettingWithCopyWarning on basic np.where statement

Time:03-18

I am getting this error SettingWithCopyWarning on this command. The command works fine, but returns this error. Is there an alternative way of doing while not receiving the error?

 # Grab DataFrame rows where column has certain values and create 'signature' string
    test['signature'] = np.where(test['id'].isin(list_of_bad_ids), 'id has a bad value in it', test['signature'])


See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test['signature'] = np.where(test['id'].isin(list_of_bad_ids), 'id has a bad value in it', test['signature'])
<command-1510344588518966>:43: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

CodePudding user response:

This is probably not due to the np.where function call, but related to your assignment to test['signature'].

If your code looks like this:

df = pd.read_...
test = df.loc[...]

test['signature'] = np.where(
    test['id'].isin(list_of_bad_ids), 'id has a bad value in it', test['signature']
)

pandas is actually warning you that it understands test is a view of df, but it is making a copy of it somewhere and then attempting to assign your output of np.where to that copy.

The reason this results in a warning is because pandas isn't certain if you're expecting this change to propagate back to the original df as it would if you were assigning to a view:

df.loc[...] = ...

If you don't want this output to propagate back to df and instead want your test DataFrame to be its own entity separate from df, you can create test as an explicit copy of df

df = pd.read_...

# We're explicitly telling pandas that test is NOT a view into `df`
test = df.loc[...].copy() 

test['signature'] = np.where(
    test['id'].isin(list_of_bad_ids), 'id has a bad value in it', test['signature']
)

If you do want this result to propagate back to df, you can keep your masking operation separate from the data and simply perform a slice & assignment when desired:

df = pd.read_...
row_mask = df['column'] > 0.5
test = df.loc[row_mask]

df.loc[row_mask, 'signature'] = np.where(
    test['id'].isin(list_of_bad_ids), 'id has a bad value in it', test['signature']
)
  • Related