I am working on dataframes in Python. I have original dataframe for 10 days. I have divided that dataframe for each day and trying to plot. I have some strange values in some columns(here y and z) ,so I am trying to use 'between method' to specify my range (0,100). The code is working, but I am getting warning. Can anyone help me please ?
for df in ((listofDF)):
if len(df) != 0:
f_df = df[df[' y'].between(0,100)]
f_df = f_df[df[' z'].between(0,100)]
maxTemp = f_df[' y']
minTemp = f_df[' z']
Time = f_df['x']
plt.plot(x,y)
plt.plot(x,z)
The warning I am getting is, UserWarning: Boolean Series key will be reindexed to match DataFrame index. f_df = f_df[df[' y'].between(0,100)]
CodePudding user response:
TL;DR Solution
Change f_df = f_df[df[' z'].between(0, 100)]
to f_df = f_df[f_df[' z'].between(0, 100)]
The warning you are getting is because of this line:
f_df = f_df[df[' z'].between(0,100)]
There's an issue with this line, can you spot it?
You're using df
to index f_df
. What you're essentially doing here is getting the rows where in df
, column z
is between 0 and 100, so let's say in df that's rows 2 and 4.
However, in f_df, the rows could be completely different. Meaning that in f_df (which is a different dataframe), the rows where z
is between 0 and 100 are rows 3 and 10. Since you're using df
to index f_df
in this sense (as in you're getting the indices that satisfy the condition in df
and using these indices to select rows from f_df
), pandas is telling you that f_df
's index is used to decide which rows to keep, which may not be what you want.
So when you do the filter on df
and it returns rows 1 and 10, it will choose rows 1 and 10 from f_df
. Or to be more accurate - it will choose the indices 1 and 10.
In your case, it is what you want because the indices are retained when you create the f_df
dataframe, as seen by the indices on the left when you print it out.
>>> df = pd.DataFrame([('a', 1, 51), ('b', 51, 31)], columns=['letter', 'x', 'y'])
>>> f_df = df[df.x.between(0, 50)]
>>> f_df
letter x y
0 a 1 51
>>> f_df = f_df[df.y.between(0, 50)]
<stdin>:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
>>> f_df
Empty DataFrame
Columns: [letter, x, y]
Index: []