Home > database >  Java regular expression: escaping multi-lines comment containing the caracter $
Java regular expression: escaping multi-lines comment containing the caracter $

Time:02-14


    final Pattern PATTERN = Pattern.compile("\"[^\"]*\"");

    @Test
    public void parseCsvTest() {
        StringBuffer result = new StringBuffer();
        Matcher m = null;
        String csv="\"foo$\n"    "bar\"";
        try {
            
            m = PATTERN.matcher(csv);
            while (m.find()) {
                m.appendReplacement(result, m.group().replaceAll("\\R ", ""));
            }
            m.appendTail(result);
        } catch (Exception e) {
            e.printStackTrace();
        }
        String escaped_csv = result.toString();
        log.info(escaped_csv);
    }

With String csv="\"foo\n" "bar\""; I'm getting the expected result that is: "foobar"

But with String csv="\"foo$\n" "bar\""; (notice the $ char after foo), the pattern doesn't identify the group. Note: $ is a char, not the "end of line symbol", despite it can be followed by a "end of line symbol".

Tried with PATTERN = Pattern.compile("\"[^\"]*^$?\""); without success. Will return foo and bar in 2 lines

Any ideas ?

CodePudding user response:

I think you have found a bug in java Matcher.append(Expanded)Replacement.

A $ just before the line's end in the input will count as an illegal group ($1, $2, ...).

CodePudding user response:

Got it work with: Pattern.compile("\"*[^$]|\"[^\"]*\"");

Results

csv = "\"foo\n" "bar\n" "doe\"" => foobardoe

csv = "\"foo$\n" "bar\n" "doe\"" => foo$bardoe

csv = "\"foo$\n" "bar$\n" "doe\"" => foo$bar$doe

csv = "\"foo$\n" "bar$\n" "doe$\"" => foo$bar$doe$

CodePudding user response:

$ is end-of-line and must be escaped. You should have a look at the excellenct documentation that your JDK comes with. If you have a decent IDE, it should guide to to it when you look up the Javadocs for Pattern. Or LMGTFY: Pattern (Java 7).

  • Related