I have a table like this:
Column1 | Column2 | Text |
---|---|---|
1 | 2 | Apple Orange Car |
2 | 5 | Apple Tree |
3 | 8 | Apple Orange |
4 | 7 | Sun Orange |
5 | 8 | Orange |
6 | 7 | Apple Orange Apple |
Now what I want is to filter this DataFrame by Text column with either (Apple or Orange) present within a text there and nothing else.
So the output should look like this:
Column1 | Column2 | Text |
---|---|---|
3 | 8 | Apple Orange |
5 | 8 | Orange |
6 | 7 | Apple Orange Apple |
What would be the way to achieve it?
CodePudding user response:
This breaks the words into a list, makes the list into a set, and then uses set operations to essentially ask:
- "Is the
Text
set a subset of{'Apple', 'Orange'}
"
df[df.Text.str.split().apply(set).le({'Apple', 'Orange'})]
Output:
Column1 Column2 Text
2 3 8 Apple Orange
4 5 8 Orange
5 6 7 Apple Orange Apple