Home > OS >  Java - Find string pattern in a huge file failing
Java - Find string pattern in a huge file failing

Time:10-12

Working on a Spring Batch project.

As an input to the method, we are passing a HUUGE file of approx 3 million lines. We need to scan this file and filter the lines which don't have the SQL codes mentioned in a whitelistedVals list. But my code is taking too much time to read such a huge file. It is working fine for other small sized files though.

public class MyClass {
    private static final List<String> whitelistedVals = new ArrayList<>();

    static {
        whitelistedVals.add("SQL123N");
        whitelistedVals.add("SQL2340W");
        whitelistedVals.add("SQL3459W");
    }

    public String getSqlError(String inputFile) {
        Pattern r = Pattern.compile("(SQL\\d [A-Z]*)(?s)(. ?)(\\n\\n|\\n\\Z)");
        Matcher m = r.matcher(inputFile);
        String error = "";
        while (m.find()) {
            String errorCode = m.group(1);
            String errorInGroup = errorCode   m.group(2).toUpperCase();
            boolean errorsFound = whitelistedVals
                    .stream()
                    .noneMatch(x -> x.equalsIgnoreCase(errorCode));

            if (errorsFound) {
                error  = errorInGroup;
            }
        }
        return error;
    }
}

Any suggestions on how this can be handled to speed up the process?

CodePudding user response:

It seems, the entire file is read and then its contents is supplied to getSqlError method while it could be better to scan the file using double line-feeds \n as a delimiter.

Also, whitelistedVals are streamed for each match, although they could be integrated into the pattern.

So the method could look as follows:

List<String> whitelistedVals = Arrays.asList("123N", "2340W", "3459W");

public static String getSqlError(String inputFile) throws Exception {
    Scanner scan = new Scanner(new File(inputFile))
        .useDelimiter("\\n\\n|\\n\\Z");
    final Spliterator<String> splt = Spliterators.spliterator(scan, Long.MAX_VALUE, Spliterator.ORDERED | Spliterator.NONNULL);

    Pattern r = Pattern.compile("(SQL(?!(" 
          whitelistedVals.stream().collect(Collectors.joining("|")) 
          ")\\b)(\\d [A-Z]*))(?s)(. )");

    return StreamSupport.stream(splt, false).onClose(scan::close)
        .flatMap(s -> r.matcher(s)
            .results() // Stream<MatchResult>
            .map(mr -> mr.group(1)   mr.group(4)) // errorCode   error
            .map(String::toUpperCase)
        ) // Stream<String>
        .collect(Collectors.joining("\n"));
}

CodePudding user response:

Using a StringBuilder instead of concat ( =) worked like a charm.

  • Related