I have looked at the following post, but did not help: How to deal with SettingWithCopyWarning in Pandas
My question:
I have this dataframe called sample
PERMNO date SHRCD EXCHCD TICKER COMNAM FACPR PRC SHROUT OPENPRC marketcap
151421 10113 2010-07-21 73.0 4.0 AADR ADVISORSHARES TRUST 0.0 24.70 100.0 25.10 2470.0
151422 10113 2010-07-22 73.0 4.0 AADR ADVISORSHARES TRUST 0.0 25.26 100.0 25.42 2526.0
151423 10113 2010-07-23 73.0 4.0 AADR ADVISORSHARES TRUST 0.0 25.28 100.0 25.54 2528.0
151424 10113 2010-07-26 73.0 4.0 AADR ADVISORSHARES TRUST 0.0 25.37 100.0 25.40 2537.0
151425 10113 2010-07-27 73.0 4.0 AADR ADVISORSHARES TRUST 0.0 25.29 100.0 25.25 2529.0
... ... ... ... ... ... ... ... ... ... ... ...
153292 10113 2017-12-22 73.0 4.0 AADR ADVISORSHARES TRUST 0.0 58.93 2650.0 58.80 156164.5
153293 10113 2017-12-26 73.0 4.0 AADR ADVISORSHARES TRUST 0.0 58.86 2650.0 58.69 155979.0
153294 10113 2017-12-27 73.0 4.0 AADR ADVISORSHARES TRUST 0.0 58.83 2650.0 58.85 155899.5
153295 10113 2017-12-28 73.0 4.0 AADR ADVISORSHARES TRUST 0.0 58.75 2650.0 59.07 155687.5
153296 10113 2017-12-29 73.0 4.0 AADR ADVISORSHARES TRUST 0.0 58.85 2850.0 59.08 167722.5
1570 rows × 11 columns
I want to create a new row named 10113 which is a copy of market. And from column [1::] I want it to be the mean of the market cap.
But I get a warning
C:\Users\waahm\AppData\Local\Temp/ipykernel_8464/2959211184.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
sample['marketcap'][1::] = marketcap_mean
My code is:
sample[10113] = sample['marketcap'].copy()
marketcap_mean = sample['marketcap'][1::].mean()
sample['marketcap'][1::] = marketcap_mean
How can I get rid of the warning? And what am I doing wrong?
CodePudding user response:
it's usually due to Chain Assignment, Chained assignment is the combination of chaining and assignment. The warning was generated because we have chained two indexing operations together.
These two chained operations execute independently, one after another. The first is an access method (get operation), that will return a DataFrame with all rows The second is an assignment operation (set operation), that is called on this new DataFrame. We are not operating on the original DataFrame at all.
=> The solution is simple: combine the chained operations into a single operation using loc so that pandas can ensure the original DataFrame is set. Pandas will always ensure that unchained set operations work. (check this link for more info https://www.dataquest.io/blog/settingwithcopywarning/)
sample.loc[1::,'marketcap']= marketcap_mean