Home > Blockchain >  Select string based on certain word's presence only and exclude everything else
Select string based on certain word's presence only and exclude everything else

Time:06-15

I have a table like this:

Column1 Column2 Text
1 2 Apple Orange Car
2 5 Apple Tree
3 8 Apple Orange
4 7 Sun Orange
5 8 Orange
6 7 Apple Orange Apple

Now what I want is to filter this DataFrame by Text column with either (Apple or Orange) present within a text there and nothing else.

So the output should look like this:

Column1 Column2 Text
3 8 Apple Orange
5 8 Orange
6 7 Apple Orange Apple

What would be the way to achieve it?

CodePudding user response:

This breaks the words into a list, makes the list into a set, and then uses set operations to essentially ask:

  • "Is the Text set a subset of {'Apple', 'Orange'}"
df[df.Text.str.split().apply(set).le({'Apple', 'Orange'})]

Output:

   Column1  Column2                Text
2        3        8        Apple Orange
4        5        8              Orange
5        6        7  Apple Orange Apple
  • Related