I need to find and then remove rows that contain a backslash in my csv file. I tried this:
df[df["query"].str.contains("\\")==False]
but this results in the error:
sre_constants.error: bogus escape (end of line)
The only way I can avoid this error is with,
df[df["query"].str.contains("\\\\")==False]
but this adds an extra double quote to everything in the file and does not remove the row.
What is the expression to identify rows containing a backslash and then remove the row?
EDIT: This is an example csv file I'm reading from:
collection,label,groups,query
Model,general,Mob,WHERE * SAYS ("trying out app"|| "trying out app"|| "trying out app's")
Model,general,Bun,WHERE * SAYS ("bundle"|| "bundles"|| "bundled"|| ""tv package""|| ""internet package""|| ""tv and internet package""|| "internet 2 bundle"|| "internet 2 package"|| "tv 2 bundle"|| "tv 2 package"|| "phone 2 bundle"|| "internet 2 phone"|| "internet 2 tv") AND NOT * SAYS ("\"EEOS|| Internet|| TV & Phone Solutions\""|| "\"EOOS|| Internet|| TV\""|| "\"phone solutions\"")
Per the answer below, I edited my code and now the row is removed.
data = pd.read_csv('so.csv')
df = pd.DataFrame(data)
df = df[~df["query"].str.contains("\\", regex=False)]
df.to_csv('sores.csv')
However in the result, double quotes are added:
,collection,label,groups,query
0,Model,general,Mob,"WHERE * SAYS (""trying out app""||
""trying out app""|| ""trying out app's"")"
CodePudding user response:
Pandas' .str.contains
uses regular expressions by default. Add regex=False
to parameters:
df[~df["query"].str.contains("\\", regex=False)]
Also note that instead of comparing to False
it's better to negate the result (~
in the beginning)
E.g.:
> df = pd.DataFrame({"query": ['positive: \\', 'negative']})
> df
query
0 positive: \
1 negative
> df[~df['query'].str.contains("\\", regex=False)]
query
1 negative