Home > database >  How to calculate checksum with InputStream and then use it again
How to calculate checksum with InputStream and then use it again

Time:01-17

I want to calculate the CRC3 checksum of a given InputStream and then use to get the string out of it. Here's what I've tried so far

private long calculateChecksum(InputStream stream) throws IOException {
    CRC32 crc = new CRC32();
    byte[] buffer = new byte[8192];
    int length;
    while ((length = stream.read(buffer)) > 0) {
        crc.update(buffer, 0, length);
    }
    return crc.getValue();
}

and then

String text = IOUtils.toString(inputStream, UTF_8);

I also tried to reverse the order. First use it as string and then calculate the checksum. But it didn't work.

What seems to be my issue is that the index goes to the end while calculating the checksum and then doesn't reset. Any idea how to use InputStream after calculating the checksum?

CodePudding user response:

InputStream is a read-once stream. Once you've read it, you can't go back to start again. This is because InputStream is general-purpose: it could be the stream of bytes read from a keyboard, for example, or read from a real-time data feed.

If your input stream is in fact a FileInputStream, then you could use

inputStream.getChannel.position(0);

to reset it to the start of the file.

If it's a ByteArrayInputStream, then you already have a byte array so you might as well just use that instead.

If you want to write a general-purpose function that doesn't know what kind of InputStream it is given, then you can wrap it in a BufferedInputStream and use its mark() method. This will use extra memory to buffer the whole of the stream.

CodePudding user response:

Yes, an InputStream is consumed. You have a few options:

  1. mark

mark() / reset() are optional methods of inputstreams; mark sets a mark (this does, by itself, nothing), and reset 'rewinds back' to the mark, replaying everything that was provided since the last time you called mark().

However, your average inputstream either does not support it, or, if it does, supports it by storing in memory all the bytes that are received since setting the mark. Meaning, if you do this to an inputstream that contains a few GB worth of data, you're going to get an OutOfMemoryError.

If there isn't a lot of data, just use mark and reset. Wrap in a BufferedInputStream which is specced to support mark/reset:

private void example(InputStream in) {
  BufferedInputStream buffered = new BufferedInputStream(in);
  in.mark();
  long crc = calculateChecksum(buffered);
  in.reset();
  String text = IOUtils.toString(buffered, UTF_8);
}
  1. Duplicate

Your second option is to duplicate the inputstream, sending each retrieved byte both to IOUtils as well as to the CRC algorithm.

This is complicated and not recommended.

  1. Checksum the string instead.

You already have a string of data. Just checksum that:

private void example(InputStream in) {
  String text = IOUtils.toString(in, UTF_8);
  CRC32 crc = new CRC32();
  crc.update(text.getBytes(UTF_8));
  long checksum = crc.getValue();
}

Or, ditching IOUtils:

private void example(InputStream in) {
  byte[] data = in.readAllBytes();
  CRC32 crc = new CRC32();
  crc.update(data);
  long checksum = crc.getValue();
  String text = new String(data, UTF_8);
}

CodePudding user response:

As others said, a stream can be consumed only once. But you can consume it and calculate the CRC value at the same time by wrapping your InputStream with a java.util.zip.CheckedInputStream.

Here is a complete example, assuming the text file "test.txt" is in the current directory and contains only this one line: These are german umlauts: äöüÄÖÜß

import org.apache.commons.io.IOUtils;
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.charset.StandardCharsets;
import java.util.zip.CRC32;
import java.util.zip.CheckedInputStream;

public class App {
  private static final String INPUT_FILE = "test.txt";

  public static void main( String[] args ) {
    final CRC32 crc32 = new CRC32();
    try(InputStream in = new CheckedInputStream(new BufferedInputStream(
                           new FileInputStream(INPUT_FILE)), crc32))  
    {
      final String text = IOUtils.toString(in, StandardCharsets.UTF_8);
      System.out.println(text);
      System.out.println(String.format("CRC32: %x", crc32.getValue()));
    } catch (IOException e) {
      e.printStackTrace();
    }
  }
}

Output:

These are german umlauts: äöüÄÖÜß
CRC32: 84bcd851
  •  Tags:  
  • java
  • Related