Working on a Spring Batch project.
As an input to the method, we are passing a HUUGE file of approx 3 million lines. We need to scan this file and filter the lines which don't have the SQL codes mentioned in a whitelistedVals list
. But my code is taking too much time to read such a huge file. It is working fine for other small sized files though.
public class MyClass {
private static final List<String> whitelistedVals = new ArrayList<>();
static {
whitelistedVals.add("SQL123N");
whitelistedVals.add("SQL2340W");
whitelistedVals.add("SQL3459W");
}
public String getSqlError(String inputFile) {
Pattern r = Pattern.compile("(SQL\\d [A-Z]*)(?s)(. ?)(\\n\\n|\\n\\Z)");
Matcher m = r.matcher(inputFile);
String error = "";
while (m.find()) {
String errorCode = m.group(1);
String errorInGroup = errorCode m.group(2).toUpperCase();
boolean errorsFound = whitelistedVals
.stream()
.noneMatch(x -> x.equalsIgnoreCase(errorCode));
if (errorsFound) {
error = errorInGroup;
}
}
return error;
}
}
Any suggestions on how this can be handled to speed up the process?
CodePudding user response:
It seems, the entire file is read and then its contents is supplied to getSqlError
method while it could be better to scan the file using double line-feeds \n
as a delimiter.
Also, whitelistedVals
are streamed for each match, although they could be integrated into the pattern.
So the method could look as follows:
List<String> whitelistedVals = Arrays.asList("123N", "2340W", "3459W");
public static String getSqlError(String inputFile) throws Exception {
Scanner scan = new Scanner(new File(inputFile))
.useDelimiter("\\n\\n|\\n\\Z");
final Spliterator<String> splt = Spliterators.spliterator(scan, Long.MAX_VALUE, Spliterator.ORDERED | Spliterator.NONNULL);
Pattern r = Pattern.compile("(SQL(?!("
whitelistedVals.stream().collect(Collectors.joining("|"))
")\\b)(\\d [A-Z]*))(?s)(. )");
return StreamSupport.stream(splt, false).onClose(scan::close)
.flatMap(s -> r.matcher(s)
.results() // Stream<MatchResult>
.map(mr -> mr.group(1) mr.group(4)) // errorCode error
.map(String::toUpperCase)
) // Stream<String>
.collect(Collectors.joining("\n"));
}
CodePudding user response:
Using a StringBuilder instead of concat ( =) worked like a charm.