I have a pandas data frame of words I web scraped and their frequency. First column is the word and second column is the frequency integer.
I have 2 lists of words that I want to find from the data frame. I want to append my data frame (or create a new one, whichever) to have only the words found in the two lists.
Example:
list1 = [Apple, Orange, Lemon]
list2 = [Cucumber, Carrot]
Data Frame
Word | Count
Apple | 2
Cilantro | 5
Orange | 9
Cupcake | 10
Carrot | 4
And I would want to create a DF that had the word and count of only apple, orange, and carrots because they were in the lists. My data frame is quite large so I need to be able to do it efficiently. Any help is greatly appreciated!
CodePudding user response:
Use Series.isin
and filter in boolean indexing
, also added Series.str.strip
for remove possible trailing whitespaces:
df1 = df[df['Word'].str.strip().isin(list1 list2)].reset_index(drop=True)
CodePudding user response:
Use isin
method:
lst = [w.lower().strip() for w in list1 list2]
out = df[df['Word']df['Word'].str.lower().str.strip().isin(lst)].reset_index(drop=True)
Output:
Word | Count
0 Apple | 2
1 Orange | 9
2 Carrot | 4