Supposed that I have a data frame that looks like this
import pandas as pd
import numpy as np
na = np.nan
df = pd.DataFrame({
'location' : ['a','a','a','a','a','b','b','b','b','b'],
'temp' : [11.6,12.2,na,12.4,12.9,27.9,27.6,na,27.2,26.8],
})
And supposed I want to interpolate missing values only in location a and I would like to use this
df.loc[df['location']=='a'].interpolate(method = 'linear',inplace=True)
print(df)
But it gives me error
/usr/local/lib/python3.8/dist-packages/pandas/core/frame.py:10709: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
return super().interpolate(
location temp
0 a 11.6
1 a 12.2
2 a NaN
3 a 12.4
4 a 12.9
5 b 27.9
6 b 27.6
7 b NaN
8 b 27.2
9 b 26.8
Any help or reference would be helpful. Thanks
CodePudding user response:
inplace=True
isn't good, here you should try:
>>> df.loc[df['location'] == 'a'] = df.interpolate()
>>> df
location temp
0 a 11.6
1 a 12.2
2 a 12.3
3 a 12.4
4 a 12.9
5 b 27.9
6 b 27.6
7 b NaN
8 b 27.2
9 b 26.8
>>>
Or:
df.loc[df['location'] == 'a'] = df.loc[df['location'] == 'a'].interpolate()
Removed linear
because it's default.
Or try df.mask
:
>>> df.mask(df['location'] == 'a', df.interpolate())
location temp
0 a 11.6
1 a 12.2
2 a 12.3
3 a 12.4
4 a 12.9
5 b 27.9
6 b 27.6
7 b NaN
8 b 27.2
9 b 26.8
>>>
CodePudding user response:
For performance filter in both sides in mask in helper variable.
Here is problem you cannot use inplace
, because creates new filtered dataframe which is a subset of original df
. Since you are using inplace=True
you are getting the aforementioned warning since it tries to modify the new dataframe inplace, to which you don't keep a reference around (and I suspect that if you'd print df you will see that this line actually had no effect), simialr like here:
m = df['location']=='a'
#linear is default, so omitted
df[m] = df[m].interpolate()