I'm seeing occasionally slow performance in writing to a file (or socket) and in debugging, I ended up with a small test harness to reproduce it. I'm seeing strange behavior whereby the longer I sleep in between writes, the longer the write itself takes. I reproduced the test harness in both Go and Python and see similar results in both. I also see similar results on both my mac and linux machines.
The key confusion is that the timing code in both cases looks like
sleep(x)
startTime = now()
write(msg)
endTime = now()
writeTime = endTime - startTime
where writeTime
gets significantly longer as the sleep
increases, despite the sleep not being included in the timing at all!
Python:
import time
f = open("/dev/null", "w")
msg = "0" * 1024
for sleepMicros in [0, 1, 10, 100, 1000]:
totalSendTime = 0
cntOver10us = 0
for _ in range(10000):
time.sleep(sleepMicros / 1000 / 1000)
startSendTime = time.time()
f.write(msg)
sentTime = time.time()
timeInSend = sentTime - startSendTime
totalSendTime = timeInSend
if timeInSend > 10 / 1000 / 1000:
cntOver10us = 1
print(f"sleep: {sleepMicros}us\tTotal time in send: {totalSendTime * 1000}ms\tCount over 10us: {cntOver10us}")
Go:
package main
import (
"fmt"
"os"
"time"
)
func main() {
f, _ := os.Create("/dev/null")
msg := make([]byte, 1024)
for _, sleepMicros := range []time.Duration{0, 1, 10, 100, 1000} {
totalSendTime := time.Duration(0)
cntOver10us := 0
sleepDuration := sleepMicros * time.Microsecond
for i := 0; i < 10000; i {
time.Sleep(sleepDuration)
startSendTime := time.Now()
_, err := f.Write(msg)
sentTime := time.Now()
if err != nil {
fmt.Println("SendMessage got error", err)
}
timeInSend := sentTime.Sub(startSendTime)
totalSendTime = timeInSend
if timeInSend > 10*time.Microsecond {
cntOver10us
}
}
fmt.Println("sleep:", sleepDuration, "\tTotal time in send:", totalSendTime, "\tCount over 10us:", cntOver10us)
}
}
Results:
Python:
(venv) % python write_to_devnull_mini.py
sleep: 0us Total time in send: 2.109527587890625ms Count over 10us: 0
sleep: 1us Total time in send: 1.9390583038330078ms Count over 10us: 0
sleep: 10us Total time in send: 1.9574165344238281ms Count over 10us: 0
sleep: 100us Total time in send: 5.476236343383789ms Count over 10us: 9
sleep: 1000us Total time in send: 50.98319053649902ms Count over 10us: 1179
Go:
(venv) % go run write_to_devnull_mini.go
sleep: 0s Total time in send: 5.724469ms Count over 10us: 0
sleep: 1µs Total time in send: 6.404582ms Count over 10us: 1
sleep: 10µs Total time in send: 4.346918ms Count over 10us: 0
sleep: 100µs Total time in send: 10.909356ms Count over 10us: 8
sleep: 1ms Total time in send: 112.627213ms Count over 10us: 1307
My best guess is that this is some artifact of the scheduler, but would love to understand what's happening. How does sleeping for 1ms in between writes cause the total time spent between calls to time.Now()
to go up by 20x, and the number of "slow" (>10us) writes to increase 200-fold?
CodePudding user response:
You omitted details like core count, scheduler, and loading from other processes.
It could be that frequent sleeping is reducing your dynamic priority.
That's different from the user-set nice
value.
Your process is predicted to use only a few cycles per second,
so it's OK to pre-empt the cPython bytecode interpreter
with a higher priority process that has a tighter deadline.
But it could also be that frequent long sleeps are allowing
other processes to dirty the cache so you suffer more cache misses.
Or, similarly, you lack affinity and keep getting rescheduled
on different cores.
Instrument the cache, or look at PSR reported by ps
if you want to investigate that.
A simple thing to look at would be
import resource
...
r = resource.getrusage(resource.RUSAGE_SELF)
and note patterns in how voluntary and r.ru_nivcsw
involuntary
context switches increment.
Also, verify that r.ru_minflt
remains constant.
Hmmm, come to think of it, slowness due to L3 cache miss should show up as non-constant user and sys mode deltas reported by getrusage().
CodePudding user response:
Wait - this looks suspicious:
totalSendTime := time.Duration(0)
cntOver10us := 0
for _, sleepMicros := range []time.Duration{0, 1, 10, 100, 1000} {
sleepDuration := sleepMicros * time.Microsecond
for i := 0; i < 10000; i {
...
}
fmt.Println("sleep:", sleepDuration, "\tTotal time in send:", totalSendTime, "\tCount over 10us:", cntOver10us)
}
Don't you really want to reset totalSendTime and cntOver10us on each iteration? Otherwise, totalSleep and cntOver10us is going to keep going up no matter how long you sleep.
for _, sleepMicros := range []time.Duration{0, 1, 10, 100, 1000} {
totalSendTime := time.Duration(0)
cntOver10us := 0
sleepDuration := sleepMicros * time.Microsecond
for i := 0; i < 10000; i {
...
}
fmt.Println("sleep:", sleepDuration, "\tTotal time in send:", totalSendTime, "\tCount over 10us:", cntOver10us)
}
Otherwise, your totalSendTime for each outer loop is only going to go up since you never seem to reset it