Home > Net >  remove entire rows from df if the word occurs
remove entire rows from df if the word occurs

Time:11-30

list of stowwords:

stop_w = ["in", "&", "the", "|", "and", "is", "of", "a", "an", "as", "for", "was"]

df:

words frequency
the company 10
green energy 9
founded in 8
gases for 8
electricity 5

I would like to remove entire row if it contains ANY of given stopwords, in this example output should be:

words frequency
green energy 9
electricity 5

CodePudding user response:

The | character has a meaning, it means or in python's terms, so you need to escape that meaning in order to use it in your stop words list. You escape that with a backslash \ (see more here)

Having said that you can do:

stop_w = ["in", "&", "the", "\|", "and", "is", "of", "a", "an", "as", "for", "was"]
df.loc[~df['words'].str.contains('|'.join(stop_w))]

prints:

          words  frequency
1  green energy          9
4   electricity          5

CodePudding user response:

You can create sub_df like this:

sub_df = df[df.words.str not in stop_w]

Or get ids of rows i want to remove:

idx = df[df.words.str in stop_w].index
df.drop(idx)

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html

  • Related