Home > OS >  Add .gz files to tar.gz file but decode gz before adding. Output files getting clipped (corrupted)
Add .gz files to tar.gz file but decode gz before adding. Output files getting clipped (corrupted)

Time:10-22

Below I have a snippet of my code which collects some gzip compressed PDF files.

I want to add the PDF's to a tar.gz file, but before adding them they need to be uncompressed (gzip). Don't want to end up with a tar.gz filled with pdf.gz files

Need to decompress it without reading the entire file into memory. The PDF files in the tar.gz are clipped and corrupted. When I compare the tar.gz files with the original PDF files the look equal except the tar.gz files are clipped. The last part of each file is missing

// Create new gz writer with compression level 1
gzw, _ := gzip.NewWriterLevel(w, 1)
defer gzw.Close()

// Create new tar writer
tw := tar.NewWriter(gzw)
defer tw.Close()

file_path := "path-to-file.pdf.gz"
file_name := "filename-shown-in-tar.pdf"

// Open file to add to tar
fp, err := os.Open(file_path)
if err != nil {
    log.Printf("Error: %v", err)
}
defer fp.Close()

file_name := file[1] file_ext

info, err   := fp.Stat()
if err != nil {
    log.Printf("Error: %v", err)
}
header, err := tar.FileInfoHeader(info, file_name)
if err != nil {
    log.Printf("Error: %v", err)
}
header.Name = file_name

tw.WriteHeader(header)

// This part will write the *.pdf.gz files directly to the tar.gz file
// This part works and it's possible to both open the tar.gz file and
// afterwards open the individuel pdf.gz files
//io.Copy(tw, fp)

// This part decode the gz before adding, but it clips the pdf files in
// the tar.gz file
gzr, err := gzip.NewReader(fp)
if err != nil {
    log.Printf("Error: %v", err)
}
defer gzr.Close()
io.Copy(tw, gzr)

update

Got a suggestion from a comment, but now the PDF files inside the tar can't be opened. The tar.gz file is created and can be opened, but the PDF files inside are corrupted

Have tried to compare output files from the tar.gz with the original PDF. It looks like the corrupted file is missing the last bit of the file.

In one example the original file has 498 lines and the corrupted has only 425. But it looks like the 425 lines are equal to the original. Somehow the last bit is just clipped

CodePudding user response:

The issue appears to be that you're setting the file info header based on the original file, which is compressed. In particular, it is the size that is causing problems - if you attempt to write in excess of the size indicated by the Size value in the header, archive/tar.Writer.Write() will return ErrWriteTooLong - see https://github.com/golang/go/blob/d5efd0dd63a8beb5cc57ae7d25f9c60d5dea5c65/src/archive/tar/writer.go#L428-L429

Something like the following should work, whereby the file is uncompressed and read so an accurate size can be established:

// Open file to add to tar
fp, err := os.Open(file_path)
if err != nil {
    log.Printf("Error: %v", err)
}
defer fp.Close()

gzr, _ := gzip.NewReader(fp)
if err != nil {
    panic(err)
}
defer gzr.Close()

data, err := io.ReadAll(gzr)
if err != nil {
    log.Printf("Error: %v", err)
}

// Create tar header for file
header := &tar.Header{
    Name: file_name,
    Mode: 0600,
    Size: int64(len(data)),
}

// Write header to the tar
if err = tw.WriteHeader(header); err != nil {
    log.Printf("Error: %v", err)
}

// Write the file content to the tar
if _, err = tw.Write(data); err != nil {
    log.Printf("Error: %v", err)
}
  •  Tags:  
  • go
  • Related