I have the following code snippet:
func main() {
// Some text we want to compress.
original := "bird and frog"
// Open a file for writing.
f, _ := os.Create("C:\\programs\\file.gz")
// Create gzip writer.
w := gzip.NewWriter(f)
// Write bytes in compressed form to the file.
while ( looping over database cursor) {
w.Write([]byte(/* the row from the database as obtained from cursor */))
}
// Close the file.
w.Close()
fmt.Println("DONE")
}
However, I wish to know a small modification. When the size of file reaches a certain threshold I want to close it and open a new file. And that too in compressed format.
For example:
Assume a database has 10 rows each row is 50 bytes.
Assume compression factor is 2, ie 1 row of 50 bytes is compressed to 25 bytes.
Assume a file size limit is 50 bytes.
Which means after every 2 records I should close the file and open a new file.
How to keep track of the file size while its still open and still writing compressed documents to it ?
CodePudding user response:
You can use the os.File.Seek
method to get your current position in the file, which as you're writing the file will be the current file size in bytes.
For example:
package main
import (
"compress/gzip"
"fmt"
"os"
)
func main() {
// Some text we want to compress.
lines := []string{
"this is a test",
"the quick brown fox",
"jumped over the lazy dog",
"the end",
}
// Open a file for writing.
f, err := os.Create("file.gz")
if err != nil {
panic(err)
}
// Create gzip writer.
w := gzip.NewWriter(f)
// Write bytes in compressed form to the file.
for _, line := range lines {
w.Write([]byte(line))
w.Flush()
pos, err := f.Seek(0, os.SEEK_CUR)
if err != nil {
panic(err)
}
fmt.Printf("pos: %d\n", pos)
}
// Close the file.
w.Close()
// The call to w.Close() will write out any remaining data
// and the final checksum.
pos, err := f.Seek(0, os.SEEK_CUR)
if err != nil {
panic(err)
}
fmt.Printf("pos: %d\n", pos)
fmt.Println("DONE")
}
Which outputs:
pos: 30
pos: 55
pos: 83
pos: 94
pos: 107
DONE
And we can confirm with wc
:
$ wc -c file.gz
107 file.gz
CodePudding user response:
gzip.NewWriter
takes a io.Writer
. It is easy to implement custom io.Writer
that does what you want.
E.g. Playground
type MultiFileWriter struct {
maxLimit int
currentSize int
currentWriter io.Writer
}
func (m *MultiFileWriter) Write(data []byte) (n int, err error) {
if len(data) m.currentSize > m.maxLimit {
m.currentWriter = createNextFile()
}
m.currentSize = len(data)
return m.currentWriter.Write(data)
}
Note: You will need to handle few edge cases like what if len(data)
is greater than the maxLimit
. And may be you don't want to split a record across files.