What is the issue?
I cannot decode valid compressed chunks from zlib stream using go's zlib
package.
I have prepared a github repo which contains code and data illustrating the issue I have: https://github.com/andreyst/zlib-issue.
What are those chunks?
They are messages generated by a text game server (MUD). This game server send compressed stream of messages in multiple chunks, first of which contains zlib header and others do not.
I have captured two chunks (first and second) with a proxy called "mcclient", which is a sidecar to provide compression for MUD clients that do not support compression. It is written in C and uses C zlib
library to decode compressed chunks.
Chunks are contained in "chunks" directory and are numerated 0
and 1
. *.in
files contain compressed data. *.out
contain uncompressed data captured from mcclient. *.log
contain status of zlib decompression (return code of inflate
call).
A special all.in
chunk is chunk 0
concatenated with chunk 1
.
Why do I think they are valid?
mcclient
successfully decompresses input chunks with C'szlib
without any issues.*.log
status shows0
which means Z_OK which means no errors in zlib parlance.zlib-flate -uncompress < chunks/all.in
works without any errors under Linux and decompresses to same content. Under Mac OS it also decompresses to same content, but with warningzlib-flate: WARNING: zlib code -5, msg = input stream is complete but output may still be valid
— which look as expected because chunks do not contain "official" stream end.- Python code in
decompress.py
correctly decompresses with bothall.in
and0
/1
chunks without any issues.
What is the issue with go's zlib?
See main.go
— it tries to decompress those chunks, starting with all.in
and then trying to decompress chunks 0
and 1
step by step.
An attempt to decode all.in
(func all()
) somewhat succeeds, at least decompressed data is the same, but zlib reader returns error flate: corrupt input before offset 446
.
When trying real-life scenario of decompressing chunk by chunk (func stream()
), zlib reader decodes first chunk with expected data, but returning an error flate: corrupt input before offset 32
, and subsequent attempt to decode chunk 1
fails completely.
The question
Is it possible to use go's zlib
package in some kind of "streaming" mode which is suited for scenario like this? Maybe I am using it incorrectly?
If not, what is the workaround? Also it would be interesting to know, why is that so — is it by design? Is it just not implemented yet? What am I missing?
CodePudding user response:
Notice that error is saying that the data at an offset after your input is corrupt. That is because of the way your are reading from the files:
buf := make([]byte, 100000)
n, readErr := f.Read(buf)
if readErr != nil {
log.Fatalf("readErr=%v\n", readErr)
}
fmt.Printf("Read bytes, n=%v\n", n)
buffer := bytes.NewBuffer(buf)
zlibReader, zlibErr := zlib.NewReader(buffer)
if zlibErr != nil {
log.Fatalf("zlibErr=%v\n", zlibErr)
}
buf := make([]byte, 100000)
will make a slice of 100000 bytes, all of them 0. But you are only reading 443 bytes in the case of all.in
. Since you never shorten the slice, the reader will encounter a few thousand zeros after the valid data and conclude it is corrupt. That is why you get output and an error.
As for streaming. In the case of a TCP/UDP connection you should be able to just pass the connection which is a io.Reader
to the zlib.NewReader
. To simulate the same I used an io.Pipe in the modified code:
package main
import (
"bytes"
"compress/zlib"
"fmt"
"io"
"log"
"os"
otherzlib "github.com/4kills/go-zlib"
)
func main() {
all()
stream()
// Alas it hangs :(
// otherZlib()
}
func all() {
fmt.Println("==== RUNNING DECOMPRESSION OF all.in")
fmt.Println("")
buf, readErr := os.ReadFile("./chunks/all.in")
if readErr != nil {
log.Fatalf("readErr=%v\n", readErr)
}
fmt.Printf("Read bytes, n=%v\n", len(buf))
buffer := bytes.NewBuffer(buf)
zlibReader, zlibErr := zlib.NewReader(buffer)
if zlibErr != nil {
log.Fatalf("zlibErr=%v\n", zlibErr)
}
out := new(bytes.Buffer)
written, copyErr := io.Copy(out, zlibReader)
if copyErr != nil {
log.Printf("copyErr=%v\n", copyErr)
}
fmt.Printf("Written bytes, n=%v, out:\n%v\n", written, out.String())
fmt.Println("")
}
func stream() {
fmt.Println("==== RUNNING DECOMPRESSION OF SEPARATE CHUNKS")
fmt.Println("")
pRead, pWrite := io.Pipe()
go func() {
buf, readErr := os.ReadFile("./chunks/0.in")
if readErr != nil {
log.Fatalf("readErr=%v\n", readErr)
}
fmt.Printf("Read 0 bytes, n=%v\n", len(buf))
written0, copy0Err := io.Copy(pWrite, bytes.NewBuffer(buf))
if copy0Err != nil {
log.Printf("copy0Err=%v\n", copy0Err)
}
fmt.Printf("Written compressed bytes, n0=%v", written0)
buf, readErr = os.ReadFile("./chunks/1.in")
if readErr != nil {
log.Fatalf("read1Err=%v\n", readErr)
}
fmt.Printf("Read 1 bytes, n=%v\n", len(buf))
written1, copy1Err := io.Copy(pWrite, bytes.NewBuffer(buf))
if copy1Err != nil {
log.Printf("copy1Err=%v\n", copy1Err)
}
fmt.Printf("Written compressed bytes, n1=%v", written1)
pWrite.Close()
}()
zlibReader, zlibErr := zlib.NewReader(pRead)
if zlibErr != nil {
log.Fatalf("zlibErr=%v\n", zlibErr)
}
out := new(bytes.Buffer)
written2, copy2Err := io.Copy(out, zlibReader)
if copy2Err != nil {
log.Printf("copy2Err=%v\n", copy2Err)
}
fmt.Printf("Written decompressed bytes, n0=%v, out:\n%v\n", written2, out.String())
fmt.Println("")
}
With this code I get no errors from stream()
but I still get a copyErr=unexpected EOF
error from all()
, looks like the all.in
is missing checksum data at the end, but I figure that is just an accident.
CodePudding user response:
With careful debugging I was able to see that I have incorrectly passed too large buffer slices which lead to incorrect input buffers being fed to decompression.
Also, it is important not to use io.Copy
, which leads to EOF on buffer which stops everything, and instead to use just zlibReader.Read(), which will decompress everything that is currently in buffer now.
I've updated the code so it now works as expected.