I need to write multiple file at the same time, however I have this weird behavior where the time taken to write is increasing when too many concurrent go routine are executed. I'm reproducing this behavior with this code sample:
package main
import (
"fmt"
"log"
"time"
"math/rand"
"sync"
"os"
"io/ioutil"
)
var wg sync.WaitGroup
var mu sync.Mutex
func WriteToFile(data []byte, fileName string) error {
mu.Lock()
defer mu.Unlock()
err := ioutil.WriteFile(fileName, data, 0666)
if err != nil {
return err
}
return nil
}
func GenerateFile(index int) error {
defer wg.Done()
start := time.Now()
elapsed := time.Since(start)
buf := make([]byte, 7500000)
rand.Read(buf) // generate random data
randomFileName := fmt.Sprintf("/tmp/gotest-%v", rand.Int())
err := WriteToFile(buf, randomFileName)
if err != nil {
return err
}
defer os.Remove(randomFileName)
elapsed = time.Since(start)
log.Printf("goroutine id: %v, generate file %s done in %s", index, randomFileName, elapsed)
return nil
}
func main() {
start := time.Now()
for i := 0; i < 30; i {
wg.Add(1)
go GenerateFile(i)
}
wg.Wait()
elapsed := time.Since(start)
log.Printf("done in %s", elapsed)
}
I get the following output:
2022/09/29 15:56:42 generate file /tmp/gotest-6971393788758345733 done in 11.900753ms
2022/09/29 15:56:42 generate file /tmp/gotest-4673598773814253679 done in 19.610377ms
2022/09/29 15:56:42 generate file /tmp/gotest-4521113305882394455 done in 30.778525ms
2022/09/29 15:56:42 generate file /tmp/gotest-8353995959628873274 done in 44.008548ms
2022/09/29 15:56:42 generate file /tmp/gotest-1525622757191165040 done in 54.167972ms
2022/09/29 15:56:42 generate file /tmp/gotest-5591269383853180110 done in 62.932048ms
2022/09/29 15:56:42 generate file /tmp/gotest-6764721932990560969 done in 69.798237ms
2022/09/29 15:56:42 generate file /tmp/gotest-2295272693261924206 done in 78.44878ms
2022/09/29 15:56:42 generate file /tmp/gotest-2676611775321001319 done in 87.014884ms
2022/09/29 15:56:42 generate file /tmp/gotest-3509686945670964049 done in 96.566297ms
2022/09/29 15:56:42 generate file /tmp/gotest-4836444971131413206 done in 105.83336ms
2022/09/29 15:56:42 generate file /tmp/gotest-1300813662850892446 done in 114.375852ms
2022/09/29 15:56:42 generate file /tmp/gotest-8820441258575272048 done in 123.49409ms
2022/09/29 15:56:42 generate file /tmp/gotest-4521621037307893446 done in 131.889565ms
2022/09/29 15:56:42 generate file /tmp/gotest-3672417531832062779 done in 141.114569ms
2022/09/29 15:56:42 generate file /tmp/gotest-1086595338191073308 done in 150.002702ms
2022/09/29 15:56:43 generate file /tmp/gotest-3614980454862600779 done in 159.816332ms
2022/09/29 15:56:43 generate file /tmp/gotest-8835923576403879976 done in 168.48278ms
2022/09/29 15:56:43 generate file /tmp/gotest-234153523891197266 done in 177.340838ms
2022/09/29 15:56:43 generate file /tmp/gotest-4694585587220869374 done in 185.675593ms
2022/09/29 15:56:43 generate file /tmp/gotest-790814872699550686 done in 194.672799ms
2022/09/29 15:56:43 generate file /tmp/gotest-1109155725008951319 done in 203.573936ms
2022/09/29 15:56:43 generate file /tmp/gotest-2107324352293381651 done in 212.619876ms
2022/09/29 15:56:43 generate file /tmp/gotest-2262414210446865118 done in 221.291951ms
2022/09/29 15:56:43 generate file /tmp/gotest-5425047562930316945 done in 230.636649ms
2022/09/29 15:56:43 generate file /tmp/gotest-5121293724555456542 done in 239.258288ms
2022/09/29 15:56:43 generate file /tmp/gotest-8761336775543022197 done in 247.899475ms
2022/09/29 15:56:43 generate file /tmp/gotest-3896102679035637543 done in 259.054695ms
2022/09/29 15:56:43 generate file /tmp/gotest-6219168972567815878 done in 266.248078ms
2022/09/29 15:56:43 generate file /tmp/gotest-8075914412323098705 done in 275.679818ms
2022/09/29 15:56:43 done in 276.64278ms
I would expect that all the requests should be done roughly at the same time, since I'm starting goroutines, and would take as much time than the first goroutine. If I remove the data generation and file writing parts, requests comes back at the same time. Also I did experiment with worker pool, but the overall time is still increasing if I add too many workers.
I don't understand this behavior. Could someone give me an explanation ?
Edit: Also is there a solution to increase parallel file writing in Go ?
Edit2: I tried by pregenerating data before writing it to disk, and using very small byte slice (5) and the behaviour is still the same.
Edit3: Ok so I don't understand all the details but I suppose it's related to the golang scheduler and the way it handles syscalls. Based on worker pool and job queue described in this article, I updated my example and got consistent results to write files.
package main
import (
"os"
"io/ioutil"
"fmt"
"time"
"log"
"math/rand"
"sync"
)
var (
MaxWorker = 2
wg sync.WaitGroup
)
// Job represents the job to be run
type Job struct {
FileIndex int
}
// A buffered channel that we can send work requests on.
var JobQueue chan Job
func GenerateFile(index int) error {
defer wg.Done()
start := time.Now()
elapsed := time.Since(start)
data := make([]byte, 7500000)
rand.Read(data) // generate random data
randomFileName := fmt.Sprintf("/tmp/gotest-%v", rand.Int())
err := ioutil.WriteFile(randomFileName, data, 0666)
if err != nil {
return err
}
defer os.Remove(randomFileName)
elapsed = time.Since(start)
log.Printf("goroutine id: %v, generate file %s done in %s", index, randomFileName, elapsed)
return nil
}
// Worker represents the worker that executes the job
type Worker struct {
WorkerPool chan chan Job
JobChannel chan Job
quit chan bool
}
func NewWorker(workerPool chan chan Job) Worker {
return Worker{
WorkerPool: workerPool,
JobChannel: make(chan Job),
quit: make(chan bool)}
}
// Start method starts the run loop for the worker, listening for a quit channel in
// case we need to stop it
func (w Worker) Start() {
go func() {
for {
// register the current worker into the worker queue.
w.WorkerPool <- w.JobChannel
select {
case job := <-w.JobChannel:
GenerateFile(job.FileIndex)
case <-w.quit:
// we have received a signal to stop
return
}
}
}()
}
// Stop signals the worker to stop listening for work requests.
func (w Worker) Stop() {
go func() {
w.quit <- true
}()
}
type Dispatcher struct {
// A pool of workers channels that are registered with the dispatcher
WorkerPool chan chan Job
MaxWorkers int
}
func NewDispatcher(maxWorkers int) *Dispatcher {
pool := make(chan chan Job, maxWorkers)
return &Dispatcher{WorkerPool: pool, MaxWorkers: maxWorkers}
}
func (d *Dispatcher) Run() {
// starting n number of workers
for i := 0; i < d.MaxWorkers; i {
worker := NewWorker(d.WorkerPool)
worker.Start()
}
go d.dispatch()
}
func (d *Dispatcher) dispatch() {
for {
select {
case job := <-JobQueue:
// a job request has been received
go func(job Job) {
// try to obtain a worker job channel that is available.
// this will block until a worker is idle
jobChannel := <-d.WorkerPool
// dispatch the job to the worker job channel
jobChannel <- job
}(job)
}
}
}
func main() {
dispatcher := NewDispatcher(MaxWorker)
dispatcher.Run()
JobQueue = make(chan Job)
start := time.Now()
for i := 0; i < 30; i {
wg.Add(1)
work := Job{FileIndex: i}
// Push the work onto the queue.
JobQueue <- work
}
wg.Wait()
elapsed := time.Since(start)
log.Printf("done in %s", elapsed)
}
Which gives me the following output:
2022/10/03 11:54:53 goroutine id: 0, write file done in 11.838574ms
2022/10/03 11:54:53 goroutine id: 2, write file done in 19.516134ms
2022/10/03 11:54:53 goroutine id: 1, write file done in 14.525345ms
2022/10/03 11:54:53 goroutine id: 3, write file done in 15.714428ms
2022/10/03 11:54:53 goroutine id: 4, write file done in 14.858648ms
2022/10/03 11:54:53 goroutine id: 6, write file done in 13.567686ms
2022/10/03 11:54:53 goroutine id: 5, write file done in 13.952953ms
2022/10/03 11:54:53 goroutine id: 8, write file done in 12.848465ms
2022/10/03 11:54:53 goroutine id: 7, write file done in 16.949355ms
2022/10/03 11:54:53 goroutine id: 10, write file done in 18.237936ms
2022/10/03 11:54:53 goroutine id: 9, write file done in 14.750701ms
2022/10/03 11:54:53 goroutine id: 11, write file done in 15.03977ms
2022/10/03 11:54:53 goroutine id: 12, write file done in 14.320434ms
2022/10/03 11:54:53 goroutine id: 14, write file done in 16.248813ms
2022/10/03 11:54:53 goroutine id: 13, write file done in 15.655994ms
2022/10/03 11:54:54 goroutine id: 16, write file done in 13.760309ms
2022/10/03 11:54:54 goroutine id: 15, write file done in 14.58592ms
2022/10/03 11:54:54 goroutine id: 18, write file done in 15.343521ms
2022/10/03 11:54:54 goroutine id: 17, write file done in 15.035341ms
2022/10/03 11:54:54 goroutine id: 20, write file done in 14.53919ms
2022/10/03 11:54:54 goroutine id: 19, write file done in 14.59754ms
2022/10/03 11:54:54 goroutine id: 22, write file done in 14.190191ms
2022/10/03 11:54:54 goroutine id: 21, write file done in 14.810599ms
2022/10/03 11:54:54 goroutine id: 24, write file done in 14.843376ms
2022/10/03 11:54:54 goroutine id: 23, write file done in 15.578197ms
2022/10/03 11:54:54 goroutine id: 26, write file done in 15.244726ms
2022/10/03 11:54:54 goroutine id: 25, write file done in 14.845854ms
2022/10/03 11:54:54 goroutine id: 27, write file done in 14.719713ms
2022/10/03 11:54:54 goroutine id: 28, write file done in 15.499384ms
2022/10/03 11:54:54 goroutine id: 29, write file done in 15.203875ms
2022/10/03 11:54:54 done in 241.901091ms
If someone can give a clear explanation that would be great
CodePudding user response:
Actually, Go does write to the disk in parallel. The reason of the sequential behavior in the question is math/rand
. This package uses thread safe random number generator that is implemented with internal mutex: see Read
, globalRnd
and lockedSource
That's why your GenerateFile
goroutines run almost strictly one by one - they synchronyze internally on rand.globalRnd.lk
mutex.
There are two ways to improve performance. One is to use independent PRNG in each thread and another one is to pre-generate data for write.
Here is a sample program that tries all variants.
package main
import (
"fmt"
"io/ioutil"
"log"
"math/rand"
"os"
"sort"
"sync"
"time"
)
var wg sync.WaitGroup
const N = 30
var elapsed_g [N]time.Duration
func SortAndLogElapsed(prefix string) {
sort.Slice(elapsed_g[:], func(i, j int) bool { return elapsed_g[i].Nanoseconds() < int64(elapsed_g[j].Nanoseconds()) })
for _, elapsed := range elapsed_g {
fmt.Println(prefix, elapsed)
}
}
func GenerateFile(start time.Time, id int) error {
defer wg.Done()
elapsed := time.Since(start)
buf := make([]byte, 7500000)
rand.Read(buf) // generate random data
randomFileName := fmt.Sprintf("/tmp/gotest-%v", rand.Int())
err := ioutil.WriteFile(randomFileName, buf, 0666)
if err != nil {
return err
}
defer os.Remove(randomFileName)
elapsed = time.Since(start)
// log.Printf("generate file %s done in %s", randomFileName, elapsed)
elapsed_g[id] = elapsed
return nil
}
func RunWithCommonPrng() {
start := time.Now()
for i := 0; i < N; i {
wg.Add(1)
go GenerateFile(start, i)
}
wg.Wait()
elapsed := time.Since(start)
SortAndLogElapsed("common PRNG: ")
log.Printf("done in %s", elapsed)
}
func GenerateFilePrivatePrng(id int, prng rand.Source, start time.Time) error {
defer wg.Done()
elapsed := time.Since(start)
buf := make([]byte, 7500000)
rand.New(prng).Read(buf) // generate random data
randomFileName := fmt.Sprintf("/tmp/gotest-%v", prng.Int63())
err := ioutil.WriteFile(randomFileName, buf, 0666)
if err != nil {
return err
}
defer os.Remove(randomFileName)
elapsed = time.Since(start)
elapsed_g[id] = elapsed
// log.Printf("generate file %s with private source: done in %s", randomFileName, elapsed)
return nil
}
func RunWithPrivatePrng() {
start := time.Now()
for i := 0; i < N; i {
wg.Add(1)
go GenerateFilePrivatePrng(i, rand.NewSource(int64(i)), start)
}
wg.Wait()
elapsed := time.Since(start)
SortAndLogElapsed("Private PRNG: ")
log.Printf("done in %s", elapsed)
}
func GenerateFileWithGivenData(id int, buf []byte, start time.Time) error {
defer wg.Done()
randomFileName := fmt.Sprintf("/tmp/gotest-%v", rand.Int())
err := ioutil.WriteFile(randomFileName, buf, 0666)
if err != nil {
return err
}
defer os.Remove(randomFileName)
elapsed := time.Since(start)
elapsed_g[id] = elapsed
// log.Printf("generate file %s with data: done in %s", randomFileName, elapsed)
return nil
}
func RunWithCommonData() {
buf := make([]byte, 7500000)
rand.Read(buf) // generate random data
start := time.Now()
for i := 0; i < N; i {
wg.Add(1)
go GenerateFileWithGivenData(i, buf, start)
}
wg.Wait()
elapsed := time.Since(start)
SortAndLogElapsed("Common data: ")
log.Printf("done in %s", elapsed)
}
func main() {
log.Printf("Used CPUs / Max CPUs: %d/%d", runtime.GOMAXPROCS(0), runtime.NumCPU())
RunWithCommonPrng()
RunWithPrivatePrng()
RunWithCommonData()
}
The output on 8-core CPU with SSD is like this:
2022/10/02 00:00:08 Used CPUs / Max CPUs: 8/8
common PRNG: 9.943335ms
common PRNG: 15.12122ms
common PRNG: 20.856216ms
common PRNG: 26.636462ms
common PRNG: 32.041066ms
common PRNG: 37.450744ms
common PRNG: 43.286644ms
common PRNG: 48.695199ms
common PRNG: 54.518533ms
common PRNG: 59.858065ms
common PRNG: 65.620084ms
common PRNG: 71.111171ms
common PRNG: 76.388583ms
common PRNG: 81.609326ms
common PRNG: 87.465878ms
common PRNG: 92.623557ms
common PRNG: 98.35468ms
common PRNG: 103.606529ms
common PRNG: 109.28623ms
common PRNG: 114.981873ms
common PRNG: 120.26626ms
common PRNG: 125.530811ms
common PRNG: 131.222195ms
common PRNG: 136.399946ms
common PRNG: 142.305635ms
common PRNG: 147.687525ms
common PRNG: 153.002392ms
common PRNG: 158.769948ms
common PRNG: 164.241503ms
common PRNG: 169.531355ms
2022/10/02 00:00:08 done in 170.273377ms
Private PRNG: 16.255543ms
Private PRNG: 17.155624ms
Private PRNG: 17.477437ms
Private PRNG: 17.49527ms
Private PRNG: 17.521759ms
Private PRNG: 18.363554ms
Private PRNG: 19.800906ms
Private PRNG: 30.340522ms
Private PRNG: 31.551496ms
Private PRNG: 40.583626ms
Private PRNG: 54.682705ms
Private PRNG: 54.832006ms
Private PRNG: 54.983126ms
Private PRNG: 55.143073ms
Private PRNG: 56.517272ms
Private PRNG: 56.577967ms
Private PRNG: 57.718ms
Private PRNG: 58.770033ms
Private PRNG: 59.246808ms
Private PRNG: 59.608246ms
Private PRNG: 59.789123ms
Private PRNG: 60.028814ms
Private PRNG: 68.533662ms
Private PRNG: 69.606317ms
Private PRNG: 69.837988ms
Private PRNG: 71.488161ms
Private PRNG: 71.770842ms
Private PRNG: 72.036881ms
Private PRNG: 72.23509ms
Private PRNG: 73.037337ms
2022/10/02 00:00:08 done in 73.694825ms
Common data: 5.220506ms
Common data: 5.220523ms
Common data: 5.220524ms
Common data: 5.220526ms
Common data: 5.221125ms
Common data: 5.221169ms
Common data: 5.222472ms
Common data: 6.977304ms
Common data: 13.601358ms
Common data: 13.614532ms
Common data: 13.859067ms
Common data: 14.75378ms
Common data: 16.00253ms
Common data: 16.111086ms
Common data: 16.263291ms
Common data: 16.42076ms
Common data: 17.024946ms
Common data: 17.313631ms
Common data: 17.749351ms
Common data: 18.18497ms
Common data: 18.83511ms
Common data: 21.789867ms
Common data: 22.308659ms
Common data: 22.308701ms
Common data: 22.546815ms
Common data: 23.298865ms
Common data: 23.482138ms
Common data: 23.610855ms
Common data: 23.667347ms
Common data: 24.500486ms
2022/10/02 00:00:08 done in 25.205652ms
"Common data" is for pre-generated buffer. It shows that Golang does write to disk in parallel. It distributes the goroutines between threads and those goroutines occupy CPU cores until I/O finishes.
UPDATE
Here is code that prints Linux thread ID and CPU number.
package main
/*
#define _GNU_SOURCE
#include <sched.h>
*/
import "C"
import (
"fmt"
"io/ioutil"
"log"
"math/rand"
"os"
"runtime"
"sort"
"sync"
"syscall"
"time"
"github.com/pkg/profile"
)
func GetCpu() int {
var ret C.int = C.sched_getcpu()
return int(ret)
}
func GetThreadId() int {
return syscall.Gettid()
}
var wg sync.WaitGroup
const N = 30
var elapsed_g [N]time.Duration
func SortAndLogElapsed(prefix string) {
sort.Slice(elapsed_g[:], func(i, j int) bool { return elapsed_g[i].Nanoseconds() < int64(elapsed_g[j].Nanoseconds()) })
for _, elapsed := range elapsed_g {
fmt.Println(prefix, elapsed)
}
}
func GenerateFileWithGivenData(id int, buf []byte, start time.Time) error {
defer wg.Done()
randomFileName := fmt.Sprintf("/tmp/gotest-%v", rand.Int())
tid := GetThreadId()
cpu := GetCpu()
before := time.Now()
fmt.Printf("Before WriteFile:\t----\t%d\t%d\t%d\t%s\n", id, tid, cpu, before.String())
err := ioutil.WriteFile(randomFileName, buf, 0666)
after := time.Now()
tid = GetThreadId()
cpu = GetCpu()
fmt.Printf("After WriteFile:\t%d\t%d\t%d\t%d\t%s\n", after.Sub(before).Microseconds(), id, tid, cpu, after.String())
if err != nil {
return err
}
defer os.Remove(randomFileName)
elapsed := time.Since(start)
elapsed_g[id] = elapsed
// log.Printf("generate file %s with data: done in %s", randomFileName, elapsed)
return nil
}
func RunWithCommonData() {
buf := make([]byte, 7500000)
rand.Read(buf) // generate random data
fmt.Printf(" \tElapsed\tG\tTID\tCPU\ttime\n")
start := time.Now()
println("")
for i := 0; i < N; i {
wg.Add(1)
go GenerateFileWithGivenData(i, buf, start)
}
wg.Wait()
elapsed := time.Since(start)
SortAndLogElapsed("Common data: ")
log.Printf("done in %s", elapsed)
}
func main() {
log.Printf("Used CPUs / Max CPUs: %d/%d", runtime.GOMAXPROCS(0), runtime.NumCPU())
// RunWithCommonPrng()
// RunWithPrivatePrng()
defer profile.Start(profile.CPUProfile).Stop()
RunWithCommonData()
}
The output on my system is (G internal goroutine ID, TID - Linux thread id, CPU - CPU number, last column is elapsed time) sorted by time:
Elapsed G TID CPU time
Before WriteFile: ---- 29 23379 0 2022-10-03 20:24:47.35247545 0900 KST m= 0.006016977
Before WriteFile: ---- 0 23380 1 2022-10-03 20:24:47.352475589 0900 KST m= 0.006017128
Before WriteFile: ---- 14 23383 7 2022-10-03 20:24:47.352506383 0900 KST m= 0.006047950
Before WriteFile: ---- 7 23381 2 2022-10-03 20:24:47.352572666 0900 KST m= 0.006114235
Before WriteFile: ---- 10 23377 6 2022-10-03 20:24:47.352634156 0900 KST m= 0.006175692
Before WriteFile: ---- 8 23384 4 2022-10-03 20:24:47.352727575 0900 KST m= 0.006269119
Before WriteFile: ---- 9 23385 5 2022-10-03 20:24:47.352766795 0900 KST m= 0.006308348
After WriteFile: 4133 14 23383 7 2022-10-03 20:24:47.356640341 0900 KST m= 0.010181880
After WriteFile: 4952 7 23381 2 2022-10-03 20:24:47.357525386 0900 KST m= 0.011066917
After WriteFile: 5049 29 23379 0 2022-10-03 20:24:47.357525403 0900 KST m= 0.011066934
After WriteFile: 4758 9 23385 5 2022-10-03 20:24:47.3575254 0900 KST m= 0.011066928
After WriteFile: 4892 10 23377 6 2022-10-03 20:24:47.357526773 0900 KST m= 0.011068303
After WriteFile: 5051 0 23380 1 2022-10-03 20:24:47.35752678 0900 KST m= 0.011068311
After WriteFile: 4801 8 23384 4 2022-10-03 20:24:47.357529101 0900 KST m= 0.011070629
Before WriteFile: ---- 12 23380 1 2022-10-03 20:24:47.357554923 0900 KST m= 0.011096462
Before WriteFile: ---- 13 23377 6 2022-10-03 20:24:47.357555161 0900 KST m= 0.011096695
Before WriteFile: ---- 1 23381 2 2022-10-03 20:24:47.357555163 0900 KST m= 0.011096697
Before WriteFile: ---- 2 23381 2 2022-10-03 20:24:47.35756292 0900 KST m= 0.011104452
Before WriteFile: ---- 11 23377 6 2022-10-03 20:24:47.3575642 0900 KST m= 0.011105730
Before WriteFile: ---- 21 23385 5 2022-10-03 20:24:47.357570038 0900 KST m= 0.011111568
Before WriteFile: ---- 25 23383 7 2022-10-03 20:24:47.357572217 0900 KST m= 0.011113747
Before WriteFile: ---- 26 23379 0 2022-10-03 20:24:47.358768915 0900 KST m= 0.012310542
Before WriteFile: ---- 27 23384 4 2022-10-03 20:24:47.361560776 0900 KST m= 0.015102306
After WriteFile: 4020 25 23383 7 2022-10-03 20:24:47.361593063 0900 KST m= 0.015134592
After WriteFile: 4873 12 23380 1 2022-10-03 20:24:47.362428015 0900 KST m= 0.015969540
After WriteFile: 4858 21 23385 5 2022-10-03 20:24:47.362428103 0900 KST m= 0.015969632
After WriteFile: 4865 2 23381 2 2022-10-03 20:24:47.362428238 0900 KST m= 0.015969769
After WriteFile: 4864 11 23377 6 2022-10-03 20:24:47.362428347 0900 KST m= 0.015969877
Before WriteFile: ---- 15 23385 5 2022-10-03 20:24:47.362454039 0900 KST m= 0.015995570
Before WriteFile: ---- 28 23380 1 2022-10-03 20:24:47.362454041 0900 KST m= 0.015995573
Before WriteFile: ---- 23 23377 6 2022-10-03 20:24:47.362454121 0900 KST m= 0.015995651
Before WriteFile: ---- 16 23385 5 2022-10-03 20:24:47.362462845 0900 KST m= 0.016004374
Before WriteFile: ---- 22 23377 6 2022-10-03 20:24:47.362479715 0900 KST m= 0.016021242
After WriteFile: 4902 26 23379 0 2022-10-03 20:24:47.363671623 0900 KST m= 0.017213150
Before WriteFile: ---- 18 23386 6 2022-10-03 20:24:47.365182522 0900 KST m= 0.018724057
After WriteFile: 8764 13 23383 7 2022-10-03 20:24:47.366320071 0900 KST m= 0.019861611
Before WriteFile: ---- 17 23379 0 2022-10-03 20:24:47.366374805 0900 KST m= 0.019916338
After WriteFile: 4902 27 23384 4 2022-10-03 20:24:47.366463028 0900 KST m= 0.020004556
After WriteFile: 4729 28 23380 1 2022-10-03 20:24:47.367183315 0900 KST m= 0.020724852
After WriteFile: 4720 16 23385 5 2022-10-03 20:24:47.367183317 0900 KST m= 0.020724850
Before WriteFile: ---- 19 23385 5 2022-10-03 20:24:47.367230069 0900 KST m= 0.020771602
Before WriteFile: ---- 20 23384 4 2022-10-03 20:24:47.367748633 0900 KST m= 0.021290163
Before WriteFile: ---- 3 23391 3 2022-10-03 20:24:47.368046383 0900 KST m= 0.021587923
Before WriteFile: ---- 5 23388 1 2022-10-03 20:24:47.36857915 0900 KST m= 0.022120682
Before WriteFile: ---- 4 23380 1 2022-10-03 20:24:47.368590097 0900 KST m= 0.022131628
Before WriteFile: ---- 6 23393 2 2022-10-03 20:24:47.370493582 0900 KST m= 0.024035118
After WriteFile: 10260 22 23377 6 2022-10-03 20:24:47.372740578 0900 KST m= 0.026282112
After WriteFile: 5326 20 23384 4 2022-10-03 20:24:47.37307519 0900 KST m= 0.026616720
After WriteFile: 10922 23 23387 0 2022-10-03 20:24:47.373376163 0900 KST m= 0.026917695
After WriteFile: 5613 3 23391 3 2022-10-03 20:24:47.373660058 0900 KST m= 0.027201605
After WriteFile: 5332 4 23380 1 2022-10-03 20:24:47.373922339 0900 KST m= 0.027463865
After WriteFile: 8871 18 23377 6 2022-10-03 20:24:47.374053982 0900 KST m= 0.027595513
After WriteFile: 7880 17 23384 4 2022-10-03 20:24:47.374255159 0900 KST m= 0.027796694
After WriteFile: 12127 15 23387 0 2022-10-03 20:24:47.37458126 0900 KST m= 0.028122790
After WriteFile: 7422 19 23391 3 2022-10-03 20:24:47.374652483 0900 KST m= 0.028194020
Before WriteFile: ---- 24 23377 6 2022-10-03 20:24:47.375338247 0900 KST m= 0.028879777
After WriteFile: 5111 6 23393 2 2022-10-03 20:24:47.375605341 0900 KST m= 0.029146871
After WriteFile: 19459 1 23392 5 2022-10-03 20:24:47.377014458 0900 KST m= 0.030555986
After WriteFile: 3847 24 23377 6 2022-10-03 20:24:47.379185393 0900 KST m= 0.032726920
After WriteFile: 10778 5 23388 0 2022-10-03 20:24:47.379358058 0900 KST m= 0.032899584
It shows that goroutines run in many differen threads on all cores of my 8-core CPU. It looks like the fastest IO is in those goroutines that preserved their thread and CPU. And it seems that parking/unparking threads make blocking IO slower.
I ran the same code with 100 goroutines. The worst case was as big as 60 ms, but it wasn't the last one, one in the middle. And even at the end there were fast writes with 5.5 ms time.