Filter 2 choices in a column-CodePudding

can anyone show me the script to filter 2 or more choices within a column ?

For example:

A  B  C  D  E  F   G
1  x  x  x  x  x  AAA
3  x  x  x  x  x  BBB
4  x  x  x  x  x  AAA
1  x  x  x  x  x  CCC
4  x  x  x  x  x  CCC
5  x  x  x  x  x  AAA

I would want to filter "AAA" and "BBB", in one go. I have tried with:

import pandas as pd 
import numpy as np 
import openpyxl 
from numpy.random import choice

df = pd.read_excel('filepath', sheet_name = 'Sheet1')

df_sample = df.loc[df['G'].str.contains("AAA", 'BBB")]

But it only returns column G with "AAA", but "BBB" is not in.

Please guide me on how to do that.

Thanks so much !

CodePudding user response：

You can use contains with the | separator :

df[df.G.str.contains("AAA|BBB")]

CodePudding user response：

You are passing two parameters to the contains function, from the documentation.

Series.str.contains(pat, case=True, flags=0, na=None, regex=True)

Your function is equivalent to

df['G'].str.contains(pat="AAA", case="BBB")

The string "BBB" is treated as True in python, so your code is equivalent to-

df['G'].str.contains(pat="AAA", case=True)

That's why you're only getting the results for "AAA" to get the correct solution you can follow @Nathan Furnal's solution

df[df.G.str.contains("AAA|BBB")]

CodePudding user response：

You could use isin or contains, but remember:

isin checks if each value in the column is contained in a list of arbitrary values.
isin works column-wise and is available for all data types.
contains checks if arbitrary values are contained in each value in the column.
contains works element-wise and makes sense only when dealing with strings (or values that can be represented as strings).

Here's an approach using isin :

choices_to_keep = ["AAA", 'BBB"]

filtered_df = df.G.isin(choices_to_keep)