I have a csv like:
file:
1;a;3;4
1;2;b;4
1;[a;b];3;4
Loading like pd.from_csv(file, sep=';')
returns error:
ParserError: Error tokenizing data. C error: Expected 4 fields in line 3, saw 5
as the [a;b]
is seen as a separator. Is there a way to exclude ;
when in [ ]
Thanks
p.s. changing the file is impossible due to reasons
CodePudding user response:
You can use ;(?![^\[]*\])
as regex separator to match only semicolons not inside brackets:
pd.read_csv(filename, sep=r';(?![^\[]*\])', engine='python')
demo:
text = '''1;a;3;4
1;2;b;4
1;[a;b];3;4
'''
import io
import pandas as pd
pd.read_csv(io.StringIO(text), sep=r';(?![^\[]*\])', engine='python')
output:
1 a 3 4
0 1 2 b 4
1 1 [a;b] 3 4