How do I decode zlib stream in Go?-CodePudding

What is the issue?

I cannot decode valid compressed chunks from zlib stream using go's zlib package.

I have prepared a github repo which contains code and data illustrating the issue I have: https://github.com/andreyst/zlib-issue.

What are those chunks?

They are messages generated by a text game server (MUD). This game server send compressed stream of messages in multiple chunks, first of which contains zlib header and others do not.

I have captured two chunks (first and second) with a proxy called "mcclient", which is a sidecar to provide compression for MUD clients that do not support compression. It is written in C and uses C zlib library to decode compressed chunks.

Chunks are contained in "chunks" directory and are numerated 0 and 1. *.in files contain compressed data. *.out contain uncompressed data captured from mcclient. *.log contain status of zlib decompression (return code of inflate call).

A special all.in chunk is chunk 0 concatenated with chunk 1.

Why do I think they are valid?

mcclient successfully decompresses input chunks with C's zlib without any issues. *.log status shows 0 which means Z_OK which means no errors in zlib parlance.
zlib-flate -uncompress < chunks/all.in works without any errors under Linux and decompresses to same content. Under Mac OS it also decompresses to same content, but with warning zlib-flate: WARNING: zlib code -5, msg = input stream is complete but output may still be valid — which look as expected because chunks do not contain "official" stream end.
Python code in decompress.py correctly decompresses with both all.in and 0/1 chunks without any issues.

What is the issue with go's zlib?

See main.go — it tries to decompress those chunks, starting with all.in and then trying to decompress chunks 0 and 1 step by step.

An attempt to decode all.in (func all()) somewhat succeeds, at least decompressed data is the same, but zlib reader returns error flate: corrupt input before offset 446.

When trying real-life scenario of decompressing chunk by chunk (func stream()), zlib reader decodes first chunk with expected data, but returning an error flate: corrupt input before offset 32, and subsequent attempt to decode chunk 1 fails completely.

The question

Is it possible to use go's zlib package in some kind of "streaming" mode which is suited for scenario like this? Maybe I am using it incorrectly?

If not, what is the workaround? Also it would be interesting to know, why is that so — is it by design? Is it just not implemented yet? What am I missing?

CodePudding user response：

Notice that error is saying that the data at an offset after your input is corrupt. That is because of the way your are reading from the files:

    buf := make([]byte, 100000)
    n, readErr := f.Read(buf)
    if readErr != nil {
        log.Fatalf("readErr=%v\n", readErr)
    }
    fmt.Printf("Read bytes, n=%v\n", n)

    buffer := bytes.NewBuffer(buf)
    zlibReader, zlibErr := zlib.NewReader(buffer)
    if zlibErr != nil {
        log.Fatalf("zlibErr=%v\n", zlibErr)
    }

buf := make([]byte, 100000) will make a slice of 100000 bytes, all of them 0. But you are only reading 443 bytes in the case of all.in. Since you never shorten the slice, the reader will encounter a few thousand zeros after the valid data and conclude it is corrupt. That is why you get output and an error.

As for streaming. In the case of a TCP/UDP connection you should be able to just pass the connection which is a io.Reader to the zlib.NewReader. To simulate the same I used an io.Pipe in the modified code:

package main

import (
    "bytes"
    "compress/zlib"
    "fmt"
    "io"
    "log"
    "os"

    otherzlib "github.com/4kills/go-zlib"
)

func main() {
    all()
    stream()

    // Alas it hangs :(
    // otherZlib()
}

func all() {
    fmt.Println("==== RUNNING DECOMPRESSION OF all.in")
    fmt.Println("")

    buf, readErr := os.ReadFile("./chunks/all.in")
    if readErr != nil {
        log.Fatalf("readErr=%v\n", readErr)
    }
    fmt.Printf("Read bytes, n=%v\n", len(buf))

    buffer := bytes.NewBuffer(buf)
    zlibReader, zlibErr := zlib.NewReader(buffer)
    if zlibErr != nil {
        log.Fatalf("zlibErr=%v\n", zlibErr)
    }

    out := new(bytes.Buffer)
    written, copyErr := io.Copy(out, zlibReader)
    if copyErr != nil {
        log.Printf("copyErr=%v\n", copyErr)
    }
    fmt.Printf("Written bytes, n=%v, out:\n%v\n", written, out.String())
    fmt.Println("")
}

func stream() {
    fmt.Println("==== RUNNING DECOMPRESSION OF SEPARATE CHUNKS")
    fmt.Println("")

    pRead, pWrite := io.Pipe()
    go func() {
        buf, readErr := os.ReadFile("./chunks/0.in")
        if readErr != nil {
            log.Fatalf("readErr=%v\n", readErr)
        }
        fmt.Printf("Read 0 bytes, n=%v\n", len(buf))

        written0, copy0Err := io.Copy(pWrite, bytes.NewBuffer(buf))
        if copy0Err != nil {
            log.Printf("copy0Err=%v\n", copy0Err)
        }
        fmt.Printf("Written compressed bytes, n0=%v", written0)

        buf, readErr = os.ReadFile("./chunks/1.in")
        if readErr != nil {
            log.Fatalf("read1Err=%v\n", readErr)
        }
        fmt.Printf("Read 1 bytes, n=%v\n", len(buf))

        written1, copy1Err := io.Copy(pWrite, bytes.NewBuffer(buf))
        if copy1Err != nil {
            log.Printf("copy1Err=%v\n", copy1Err)
        }
        fmt.Printf("Written compressed bytes, n1=%v", written1)

        pWrite.Close()
    }()

    zlibReader, zlibErr := zlib.NewReader(pRead)
    if zlibErr != nil {
        log.Fatalf("zlibErr=%v\n", zlibErr)
    }

    out := new(bytes.Buffer)
    written2, copy2Err := io.Copy(out, zlibReader)
    if copy2Err != nil {
        log.Printf("copy2Err=%v\n", copy2Err)
    }
    fmt.Printf("Written decompressed bytes, n0=%v, out:\n%v\n", written2, out.String())

    fmt.Println("")
}

With this code I get no errors from stream() but I still get a copyErr=unexpected EOF error from all(), looks like the all.in is missing checksum data at the end, but I figure that is just an accident.

CodePudding user response：

With careful debugging I was able to see that I have incorrectly passed too large buffer slices which lead to incorrect input buffers being fed to decompression.

Also, it is important not to use io.Copy, which leads to EOF on buffer which stops everything, and instead to use just zlibReader.Read(), which will decompress everything that is currently in buffer now.

I've updated the code so it now works as expected.