I love using splat to build arrays and hashes:
- they are array and hash literals, so you don't have to follow some computation to see what kind of value you get, the syntax is very clear
- they make it easy to build quite complex values in a single expression, instead of using a more imperative style (that, yes, you can turn into a single assignment with things like
tap
, but it's less readable).
However splatting is costly.
require 'benchmark'
$array = (0...100).to_a
n = 100_000
Benchmark.bm do |x|
x.report('add ') {n.times{$array $array $array}}
x.report('splat ') {n.times{[*$array, *$array, *$array]}}
end
On Machine A (MRI 3.1.3) I have:
user system total real
add 0.031583 0.001421 0.033004 ( 0.033006)
splat 0.050174 0.001397 0.051571 ( 0.051584)
On Machine B (MRI 2.7.4):
user system total real
add 0.278377 0.000000 0.278377 ( 0.278316)
splat 0.780735 0.043730 0.824465 ( 0.824377)
How come splat-based array construction is so slow? I expect splat-based construction to be no slower than plain addition (after all the AST could even turn one into the other), and I actually expect it to be more efficient (since the language can see everything, so it can avoid the intermediate arrays created by binary addition, it can also anticipate the size of the final array and reserve the space upfront, etc.).
So how come the alternative, that go throw a method call (so, a priori, less optimizable by the interpreter), are faster than something in which everything is honestly exposed to the interpreter?
EDIT: more alternatives
require 'benchmark'
$array = (0...100).to_a
def add
$array $array $array
end
def append
res = $array.dup
res.append(*$array)
res.append(*$array)
res
end
def concat2
res = []
res.concat($array)
res.concat($array)
res.concat($array)
res
end
def concat3
[].concat($array, $array, $array)
end
def concat_splat
[].concat(*[$array, $array, $array])
end
def flatten
[$array, $array, $array].flatten
end
def flatten_1
[$array, $array, $array].flatten(1)
end
def splat
[*$array, *$array, *$array]
end
n = 100_000
Benchmark.bm do |x|
x.report('add ') {n.times{add}}
x.report('append ') {n.times{append}}
x.report('concat2 ') {n.times{concat2}}
x.report('concat3 ') {n.times{concat3}}
x.report('concat_splat') {n.times{concat_splat}}
x.report('flatten ') {n.times{flatten}}
x.report('flatten(1) ') {n.times{flatten_1}}
x.report('splat ') {n.times{splat}}
end
This is Machine A, MRI 3.1.3.
user system total real
add 0.032841 0.001502 0.034343 ( 0.034347)
append 0.059024 0.009869 0.068893 ( 0.068944)
concat2 0.047542 0.000144 0.047686 ( 0.047690)
concat3 0.062913 0.010196 0.073109 ( 0.073111)
concat_splat 0.056044 0.000748 0.056792 ( 0.056796)
flatten 0.978091 0.005750 0.983841 ( 0.983952)
flatten(1) 0.165467 0.000998 0.166465 ( 0.166472)
splat 0.049761 0.000131 0.049892 ( 0.049896)
CodePudding user response:
Addition and splat versions emit different bytecode (some output is omitted for brevity):
puts RubyVM::InstructionSequence.compile(<<~ADDITION).disasm
src = (0...100).to_a
res = src src src
ADDITION
0000 putobject 0...100 ( 1)[Li]
0002 opt_send_without_block <calldata!mid:to_a, argc:0, ARGS_SIMPLE>
0004 setlocal_WC_0 src@0
0006 getlocal_WC_0 src@0 ( 2)[Li]
0008 getlocal_WC_0 src@0
0010 opt_plus <calldata!mid: , argc:1, ARGS_SIMPLE>
0012 getlocal_WC_0 src@0
0014 opt_plus <calldata!mid: , argc:1, ARGS_SIMPLE>
0016 dup
0017 setlocal_WC_0 res@1
0019 leave
vs
puts RubyVM::InstructionSequence.compile(<<~SPLATS).disasm
src = (0...100).to_a
res = [*src, *src, *src]
SPLATS
0000 putobject 0...100 ( 1)[Li]
0002 opt_send_without_block <calldata!mid:to_a, argc:0, ARGS_SIMPLE>
0004 setlocal_WC_0 src@0
0006 getlocal_WC_0 src@0 ( 2)[Li]
0008 splatarray true
0010 getlocal_WC_0 src@0
0012 concatarray
0013 getlocal_WC_0 src@0
0015 concatarray
0016 dup
0017 setlocal_WC_0 res@1
0019 leave
Two snippets above looks quite similar, the difference is 2 ops_plus
instructions vs splatarray
2 concatarray
s. But implementation-wise the difference become bigger.
The first one boils down to 2 rb_ary_plus
, in a nutshell:
- allocate memory for src src
- copy src src to a new memory location
- allocate memory for (src src) src
- copy (src src) src to a new memory location
The latter seems to be more complex internally: splatarray
boils down to rb_ary_dup
(so we copy ary first), concatarray
under the hood duplicates a target array too and then boils down to rb_ary_splice
; the latter is a bit hairy, but I believe we go to this branch where we effectively double array capacity (which includes copying the 1st array) and then copy the 2nd array. I'm not 100% sure if I'm tracing this execution flow properly, but if I do it gives us:
- duplicate src
- duplicate asrc again (?)
- double target capacity (includes copying)
- copy the 2nd array to the space allocated above
- duplicate (src src)
- double (src src) capacity (includes copying (src src) elements to a new memory location)
- copy 3rd array to the allocated space
These additional duplications could explain the difference (not to mention overall complexity of the latter meaning more conditionals checked etc).