Home > Software engineering >  Append a data frame to include only the words from a list - Python
Append a data frame to include only the words from a list - Python

Time:12-13

I have a pandas data frame of words I web scraped and their frequency. First column is the word and second column is the frequency integer.

I have 2 lists of words that I want to find from the data frame. I want to append my data frame (or create a new one, whichever) to have only the words found in the two lists.

Example:

list1 = [Apple, Orange, Lemon]
list2 = [Cucumber, Carrot]

Data Frame

Word     | Count
Apple    | 2
Cilantro | 5
Orange   | 9
Cupcake  | 10
Carrot   | 4

And I would want to create a DF that had the word and count of only apple, orange, and carrots because they were in the lists. My data frame is quite large so I need to be able to do it efficiently. Any help is greatly appreciated!

CodePudding user response:

Use Series.isin and filter in boolean indexing, also added Series.str.strip for remove possible trailing whitespaces:

df1 = df[df['Word'].str.strip().isin(list1   list2)].reset_index(drop=True)

CodePudding user response:

Use isin method:

lst = [w.lower().strip() for w in list1   list2]
out = df[df['Word']df['Word'].str.lower().str.strip().isin(lst)].reset_index(drop=True)

Output:

     Word  |  Count
0   Apple  |      2
1  Orange  |      9
2  Carrot  |      4
  • Related