Scanner.Buffer - max value has no effect on custom Split?-CodePudding

To reduce the default 64k scanner buffer (for microcomputer with low memory), I try to use this buffer and custom split functions:

scanner.Buffer(make([]byte, 5120), 64)
scanner.Split(Scan64Bytes)

Here I noticed that the second buffer argument "max" has no effect. If I instead insert e.g. 0, 1, 5120 or bufio.MaxScanTokenSize, I can' t see any difference. Only the first argument "buf" has consequences. Is the capacity to small the scan is incomplete and if it's to large the B/op benchmem value increases.

From the doc:

The maximum token size is the larger of max and cap(buf). If max <= cap(buf), Scan will use this buffer only and do no allocation.

I don't understand which is the correct max value. Can you maybe explain this to me, please?

Go Playground

package main

import (
    "bufio"
    "bytes"
    "fmt"
)

func Scan64Bytes(data []byte, atEOF bool) (advance int, token []byte, err error) {
    if len(data) < 64 {
        return 0, data[0:], bufio.ErrFinalToken
    }
    return 64, data[0:64], nil
}

func main() {
    // improvised source of the same size:
    cmdstd := bytes.NewReader(make([]byte, 5120))
    scanner := bufio.NewScanner(cmdstd)

    // I guess 64 is the correct max arg:
    scanner.Buffer(make([]byte, 5120), 64)
    scanner.Split(Scan64Bytes)

    for i := 0; scanner.Scan(); i   {
        fmt.Printf("%v: %v\r\n", i, scanner.Bytes())
    }

    if err := scanner.Err(); err != nil {
        fmt.Println(err)
    }
}

CodePudding user response：

max value has no effect on custom Split?

No, without split there is the same result. But this wouldn't be possible without split and ErrFinalToken:

//your reader/input
cmdstd := bytes.NewReader(make([]byte, 5120))

// your scanner buffer size
scanner.Buffer(make([]byte, 5120), 64)

The buffer size from the scanner should be larger. This is how I would set buf and max:

scanner.Buffer(make([]byte, 5121), 5120)