list of stowwords:
stop_w = ["in", "&", "the", "|", "and", "is", "of", "a", "an", "as", "for", "was"]
df:
words | frequency |
---|---|
the company | 10 |
green energy | 9 |
founded in | 8 |
gases for | 8 |
electricity | 5 |
I would like to remove entire row if it contains ANY of given stopwords, in this example output should be:
words | frequency |
---|---|
green energy | 9 |
electricity | 5 |
CodePudding user response:
The |
character has a meaning, it means or
in python's terms, so you need to escape that meaning in order to use it in your stop words list. You escape that with a backslash \
(see more here)
Having said that you can do:
stop_w = ["in", "&", "the", "\|", "and", "is", "of", "a", "an", "as", "for", "was"]
df.loc[~df['words'].str.contains('|'.join(stop_w))]
prints:
words frequency
1 green energy 9
4 electricity 5
CodePudding user response:
You can create sub_df like this:
sub_df = df[df.words.str not in stop_w]
Or get ids of rows i want to remove:
idx = df[df.words.str in stop_w].index
df.drop(idx)
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html