Home > OS >  trying to remove rows with backslash characters with pandas but getting unwanted double quotes added
trying to remove rows with backslash characters with pandas but getting unwanted double quotes added

Time:03-08

I need to find and then remove rows that contain a backslash in my csv file. I tried this:

   df[df["query"].str.contains("\\")==False]

but this results in the error:

    sre_constants.error: bogus escape (end of line)

The only way I can avoid this error is with,

    df[df["query"].str.contains("\\\\")==False]

but this adds an extra double quote to everything in the file and does not remove the row.

What is the expression to identify rows containing a backslash and then remove the row?

EDIT: This is an example csv file I'm reading from:

    collection,label,groups,query
    Model,general,Mob,WHERE * SAYS ("trying out app"|| "trying out app"|| "trying out app's")
    Model,general,Bun,WHERE * SAYS ("bundle"|| "bundles"|| "bundled"|| ""tv package""|| ""internet package""|| ""tv and internet package""|| "internet 2 bundle"|| "internet 2 package"|| "tv 2 bundle"|| "tv 2 package"|| "phone 2 bundle"|| "internet 2 phone"|| "internet 2 tv") AND NOT * SAYS ("\"EEOS|| Internet|| TV & Phone Solutions\""|| "\"EOOS|| Internet|| TV\""|| "\"phone solutions\"")

Per the answer below, I edited my code and now the row is removed.

    data = pd.read_csv('so.csv')
    df = pd.DataFrame(data)
    df = df[~df["query"].str.contains("\\", regex=False)]

    df.to_csv('sores.csv')

However in the result, double quotes are added:

    ,collection,label,groups,query
    0,Model,general,Mob,"WHERE * SAYS (""trying out app""|| 
    ""trying out app""|| ""trying out app's"")"      

CodePudding user response:

Pandas' .str.contains uses regular expressions by default. Add regex=False to parameters:

df[~df["query"].str.contains("\\", regex=False)]

Also note that instead of comparing to False it's better to negate the result (~ in the beginning)

E.g.:

> df = pd.DataFrame({"query": ['positive: \\', 'negative']})
> df
         query
0  positive: \
1     negative

> df[~df['query'].str.contains("\\", regex=False)]
      query
1  negative
  • Related