Home > database >  pandas.read_csv() How to exclude specific separtor combinations
pandas.read_csv() How to exclude specific separtor combinations

Time:03-19

I have a csv like:

file:

1;a;3;4
1;2;b;4
1;[a;b];3;4

Loading like pd.from_csv(file, sep=';')

returns error:

ParserError: Error tokenizing data. C error: Expected 4 fields in line 3, saw 5

as the [a;b] is seen as a separator. Is there a way to exclude ; when in [ ]

Thanks

p.s. changing the file is impossible due to reasons

CodePudding user response:

You can use ;(?![^\[]*\]) as regex separator to match only semicolons not inside brackets:

pd.read_csv(filename, sep=r';(?![^\[]*\])', engine='python')

demo:

text = '''1;a;3;4
1;2;b;4
1;[a;b];3;4
'''

import io
import pandas as pd

pd.read_csv(io.StringIO(text), sep=r';(?![^\[]*\])', engine='python')

output:

   1      a  3  4
0  1      2  b  4
1  1  [a;b]  3  4

regex demo

  • Related