Setup

With standard C code (= no platform specific code), I have written a program to do the following:

Get starting clock()
Open a file
write a ~250MB long string to it using one of the below listed modes
close the file.
Repeat 2...4 10000 times as fast as possible, rip storage unit
Get ending clock()
Do some time calculations and output

A) bulk mode: Write everything at once (= one call to fwrite)
B) chunk mode: Write string in chunks. One chunk is slightly more than 1MB. (= multiple calls to fwrite, about ~250).

Then, I let the program run on two different computers.

Expection

I expect A) being faster than B).

Results

Below was on my beefy PC with a Samsung 970 EVO M.2 SSD (CPU = AMD Ryzen 2700x: 8 cores / 16 threads). The output on this one is slightly wrong, it should've been Ns/file, not Ns/write)

Below was on my laptop. I don't really know what type of SSD is installed (and I don't bother too much to check it out). If it matters, or anyone wants to and knows how to research, the laptop is a Surface Book 3.

Conclusion

Beefy PC: B) is faster than A), against expectations.
Laptop: A) is faster than B), within expectations.

My best guess is that some sort of hidden parellization is at work. Either the CPU does smart things, the SSD does very smart things, or they work together to do incredibly smart things. But pinning and writing down anything further sounds too absurd for me to keep it staying here.

What explains the difference in my expectation and the results?

The benchmark

Check out https://github.com/rphii/Rlib, under examples/writecomp.c

More Text

I noticed this effect while working on my beefy PC with a string of length ~25MB. Since B) was a marginal, but consistent, ~4ms faster than A), I increased the string length and did a more thorough test.

CodePudding user response：

Since no one's gonna do it, I'll answer my question based on the comment I got.

clock does not measure the wall clock time but the CPU time. Please read this post.
Reads/writes are generally buffered.
Operating systems generally uses an in-memory cache (especially for HDD).
SSD reads can be faster in parallel (and often are for recent ones) while HDD are almost never faster in parallel. (this quite recent post provides some information about caching and buffering).