final Pattern PATTERN = Pattern.compile("\"[^\"]*\"");
@Test
public void parseCsvTest() {
StringBuffer result = new StringBuffer();
Matcher m = null;
String csv="\"foo$\n" "bar\"";
try {
m = PATTERN.matcher(csv);
while (m.find()) {
m.appendReplacement(result, m.group().replaceAll("\\R ", ""));
}
m.appendTail(result);
} catch (Exception e) {
e.printStackTrace();
}
String escaped_csv = result.toString();
log.info(escaped_csv);
}
With String csv="\"foo\n" "bar\"";
I'm getting the expected result that is: "foobar"
But with String csv="\"foo$\n" "bar\"";
(notice the $ char after foo), the pattern doesn't identify the group. Note: $ is a char, not the "end of line symbol", despite it can be followed by a "end of line symbol".
Tried with PATTERN = Pattern.compile("\"[^\"]*^$?\"");
without success. Will return foo and bar in 2 lines
Any ideas ?
CodePudding user response:
I think you have found a bug in java Matcher.append(Expanded)Replacement.
A $
just before the line's end in the input will count as an illegal group ($1
, $2
, ...).
CodePudding user response:
Got it work with: Pattern.compile("\"*[^$]|\"[^\"]*\"");
Results
csv = "\"foo\n" "bar\n" "doe\"" => foobardoe
csv = "\"foo$\n" "bar\n" "doe\"" => foo$bardoe
csv = "\"foo$\n" "bar$\n" "doe\"" => foo$bar$doe
csv = "\"foo$\n" "bar$\n" "doe$\"" => foo$bar$doe$
CodePudding user response:
$
is end-of-line and must be escaped. You should have a look at the excellenct documentation that your JDK comes with. If you have a decent IDE, it should guide to to it when you look up the Javadocs for Pattern
. Or LMGTFY: Pattern (Java 7).