Home > Back-end >  Stream.forEach OutOfMemoryError reading large file
Stream.forEach OutOfMemoryError reading large file

Time:11-14

I'm using Java11 and I'm reading a file with around 600MB, where every line has the same length (274 chars).

This is the code I'm using:

Path tempFile;
try (final Stream<String> stream = Files.lines(largeFilePath, StandardCharsets.ISO_8859_1).sorted()) {
    tempFile = Files.createTempFile(null, null);
    stream.forEach(e -> {
        if (StringUtils.startsWith(e, "aa")) {
            try {
                Files.write(tempFile, (e   System.lineSeparator()).getBytes(), StandardOpenOption.APPEND);
            } catch (final IOException e1) {
                throw new RuntimeException(e1);
            }
        }
    });
} catch (final Exception e) {
    throw e;
}

This is the error:

java.lang.OutOfMemoryError: Java heap space
    at java.lang.StringUTF16.compress(StringUTF16.java:160) ~[?:?]
    at java.lang.String.<init>(String.java:3214) ~[?:?]
    at java.lang.String.<init>(String.java:276) ~[?:?]
    at java.io.BufferedReader.readLine(BufferedReader.java:358) ~[?:?]
    at java.io.BufferedReader.readLine(BufferedReader.java:392) ~[?:?]
    at java.nio.file.FileChannelLinesSpliterator.readLine(FileChannelLinesSpliterator.java:171) ~[?:?]
    at java.nio.file.FileChannelLinesSpliterator.forEachRemaining(FileChannelLinesSpliterator.java:113) ~[?:?]
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?]
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) ~[?:?]
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) ~[?:?]
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) ~[?:?]
    at mypackage.MyClass.execute(MyClass.java:103) ~[classes/:?]

The line where it crashes is:

stream.forEach(e -> {

I don't know what I'm missing here... in theory that code should be memory safe, right? If I use a smaller file it works perfectly.

These are my memory settings:

-Xms512m
-Xmx512m
-XX:MaxMetaspaceSize=256m

CodePudding user response:

You ask the lines to be sorted. This requires ALL of them to be read to memory first, and their total size exceeds the max amount of heap you give to the program.

Either give it more heap, or use something like File Sort (aka External sorting, https://en.wikipedia.org/wiki/External_sorting ).

CodePudding user response:

The file is too large, this will consume too much resources, you can use the database much easier, I think

  • Related