I am getting this error SettingWithCopyWarning
on this command. The command works fine, but returns this error. Is there an alternative way of doing while not receiving the error?
# Grab DataFrame rows where column has certain values and create 'signature' string
test['signature'] = np.where(test['id'].isin(list_of_bad_ids), 'id has a bad value in it', test['signature'])
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
test['signature'] = np.where(test['id'].isin(list_of_bad_ids), 'id has a bad value in it', test['signature'])
<command-1510344588518966>:43: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
CodePudding user response:
This is probably not due to the np.where
function call, but related to your assignment to test['signature']
.
If your code looks like this:
df = pd.read_...
test = df.loc[...]
test['signature'] = np.where(
test['id'].isin(list_of_bad_ids), 'id has a bad value in it', test['signature']
)
pandas is actually warning you that it understands test is a view of df
, but it is making a copy of it somewhere and then attempting to assign your output of np.where
to that copy.
The reason this results in a warning is because pandas
isn't certain if you're expecting this change to propagate back to the original df
as it would if you were assigning to a view:
df.loc[...] = ...
If you don't want this output to propagate back to df
and instead want your test
DataFrame to be its own entity separate from df
, you can create test
as an explicit copy of df
df = pd.read_...
# We're explicitly telling pandas that test is NOT a view into `df`
test = df.loc[...].copy()
test['signature'] = np.where(
test['id'].isin(list_of_bad_ids), 'id has a bad value in it', test['signature']
)
If you do want this result to propagate back to df
, you can keep your masking operation separate from the data and simply perform a slice & assignment when desired:
df = pd.read_...
row_mask = df['column'] > 0.5
test = df.loc[row_mask]
df.loc[row_mask, 'signature'] = np.where(
test['id'].isin(list_of_bad_ids), 'id has a bad value in it', test['signature']
)