I have a dataframe of 51077 rows × 4 columns. I need to subset a dataframe with the rows with values > 0.3 and < -0.3 in the third column.
I used the following:
df_filtered = df[np.logical_and(df["third column"] > 0.3, df["third column"] < -0.3)]
But the result showed only the columns'names
I also tried:
df_filtered = df.query("third column < -0.3 & third column > 0.3")
But the result was the same.
How do I resolve this?
CodePudding user response:
You can also use between
and reverse the result:
df_filtered = df[~df['third_column'].between(-0.3, 0.3)]
Example:
>>> df
third_column
0 -0.190030
1 -0.205187
2 -0.066776
3 -0.264480
4 0.064962
5 0.024708
6 -0.354629 # Want to keep
7 -0.180228
8 0.261640
9 0.315986 # Want to keep
>>> df[~df['third_column'].between(-0.3, 0.3)]
third_column
6 -0.354629
9 0.315986
CodePudding user response:
You almost got it:
df_filtered = df.loc[(df['third column'] > 0.3) | (df['third column'] < -0.3)]
or
df_filtered = df[(df['third column'] > 0.3) | (df['third column'] < -0.3)]