Home > Net >  DeflaterOutputStream/InputStream corrupting data
DeflaterOutputStream/InputStream corrupting data

Time:04-04

I've got a simple test case that fails to compress a stream of data. I generate a byte[] of some random bytes, compress it via DeflaterOutputStream, flush() the stream, then reverse those operations to retrieve the original array. At byte 505 the reconstructed stream starts to consist entirely of 0x00 bytes, and I don't understand why:

        //
        // create some random bytes
        //
        Random rng = new Random();
        int len = 5000;
        byte[] data = new byte[len];
        for (int i = 0; i < len;   i)
            data[i] = (byte) rng.nextInt(0xff);

        //
        // write to byte[] via a deflater stream
        //
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        DeflaterOutputStream os = new DeflaterOutputStream(baos, true);
        os.write(data);
        os.flush();

        //
        // read back into byte[] via an inflater stream
        //
        ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
        InflaterInputStream is = new InflaterInputStream(bais);
        byte[] readbytes = new byte[len];
        is.read(readbytes);

        //
        // check they match (they don't, at byte 505)
        //
        for (int i = 0; i < len;   i)
            if (data[i] != readbytes[i])
                throw new RuntimeException("Mismatch at position "   i);

It doesn't seem to matter what's in the source array, it's always at position 505 it fails.

Here's what the two byte[] arrays look like around the region they differ:

?\m·g··gWNLErZ···,··-··=·;n=··F?···13·{·rw·······\`3···f····{/····t·1·WK$·······WZ······x
?\m·g··gWNLErZ···,··-····································································
                       ^byte 505

All those unprintable chars are 0x00 from that point on. Why is this happening? I feel like I must be misunderstanding something fundamental about how the Deflate/Inflate streams work. The real-world use case here is a stream over a network that I thought I could easily improve the performance of by inserting Deflate/Inflate streams into

CodePudding user response:

When I test this, is.read(readBytes) returns 505, the length of bytes read. The other single-argument-array stream methods return void and guarantee that the entire array is read or written, but is.read() is a different API and requires that you check the amount of bytes actually read.

    ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
    System.err.println( "bais size = "   bais.available() );
    InflaterInputStream is = new InflaterInputStream(bais);
    byte[] readbytes = new byte[len];
    System.err.println( "read = "   is.read(readbytes) );  // 505

This runs without throwing an error for me:

    ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
    System.err.println( "bais size = "   bais.available() );
    InflaterInputStream is = new InflaterInputStream(bais);
    byte[] readbytes = new byte[len];
    for( int total = 0, result = 0; (result = is.read(readbytes, total, len-total )) != -1; )
    {
       total  = result;
       System.err.println( "reading : "   total );
       if( total == len ) break;
    }
  • Related