Append a data frame to include only the words from a list

I have a pandas data frame of words I web scraped and their frequency. First column is the word and second column is the frequency integer.

I have 2 lists of words that I want to find from the data frame. I want to append my data frame (or create a new one, whichever) to have only the words found in the two lists.

Example:

list1 = [Apple, Orange, Lemon]
list2 = [Cucumber, Carrot]

Data Frame

Word     | Count
Apple    | 2
Cilantro | 5
Orange   | 9
Cupcake  | 10
Carrot   | 4

And I would want to create a DF that had the word and count of only apple, orange, and carrots because they were in the lists. My data frame is quite large so I need to be able to do it efficiently. Any help is greatly appreciated!

CodePudding user response：

Use Series.isin and filter in boolean indexing, also added Series.str.strip for remove possible trailing whitespaces:

df1 = df[df['Word'].str.strip().isin(list1   list2)].reset_index(drop=True)

CodePudding user response：

Use isin method:

lst = [w.lower().strip() for w in list1   list2]
out = df[df['Word']df['Word'].str.lower().str.strip().isin(lst)].reset_index(drop=True)

Output:

     Word  |  Count
0   Apple  |      2
1  Orange  |      9
2  Carrot  |      4