Home > Software engineering >  CSV parsing with univocity-parsers and backslash-escaped quotes
CSV parsing with univocity-parsers and backslash-escaped quotes

Time:12-13

I'm having some trouble parsing CSV with backslash escaped qoutes \". Most of lines in source CSV don't include escaped quotes but where there are I can't seem to find appropriate settings for correct parsing.

CSV example (each line with 4 columns):

1,,No quote escape,test
2,,"One quote escape\"",test
3,,"Two \"quote escapes\",test
4,,"Two \"quote escapes\" 2",test

CSV parser settings:

CsvFormat:
        Comment character=#
        Field delimiter=,
        Line separator (normalized)=\n
        Line separator sequence=\r\n
        Quote character="
        Quote escape character=\
        Quote escape escape character=null

Code snippet:

CsvParserSettings settings = new CsvParserSettings();

settings.setDelimiterDetectionEnabled(true);
settings.setLineSeparatorDetectionEnabled(true);
settings.getFormat().setQuote('"');
settings.getFormat().setQuoteEscape('\\');

CsvParser parser = new CsvParser(settings);

parser.beginParsing(file, StandardCharsets.UTF_8);
...

Lines are parsed correctly until two escaped quotes are present in one line. Expected parsed lines are:

- 1,null,No quote escape,test
- 2,null,One quote escape",test
- 3,null,Two "quote escapes",test
- 4,null,Two "quote escapes" 2,test

CodePudding user response:

Upon further inspection I found an existing issue for v2.9.1.

  • Related