Select string based on certain word's presence only and exclude everything else-CodePudding

I have a table like this:

Column1	Column2	Text
1	2	Apple Orange Car
2	5	Apple Tree
3	8	Apple Orange
4	7	Sun Orange
5	8	Orange
6	7	Apple Orange Apple

Now what I want is to filter this DataFrame by Text column with either (Apple or Orange) present within a text there and nothing else.

So the output should look like this:

Column1	Column2	Text
3	8	Apple Orange
5	8	Orange
6	7	Apple Orange Apple

What would be the way to achieve it?

CodePudding user response：

This breaks the words into a list, makes the list into a set, and then uses set operations to essentially ask:

"Is the Text set a subset of {'Apple', 'Orange'}"

df[df.Text.str.split().apply(set).le({'Apple', 'Orange'})]

Output:

   Column1  Column2                Text
2        3        8        Apple Orange
4        5        8              Orange
5        6        7  Apple Orange Apple