Why is splat in arrays so costly?-CodePudding

I love using splat to build arrays and hashes:

they are array and hash literals, so you don't have to follow some computation to see what kind of value you get, the syntax is very clear
they make it easy to build quite complex values in a single expression, instead of using a more imperative style (that, yes, you can turn into a single assignment with things like tap, but it's less readable).

However splatting is costly.

require 'benchmark'

$array = (0...100).to_a

n = 100_000
Benchmark.bm do |x|
  x.report('add   ') {n.times{$array   $array   $array}}
  x.report('splat ') {n.times{[*$array, *$array, *$array]}}
end

On Machine A (MRI 3.1.3) I have:

          user     system      total        real
add     0.031583   0.001421   0.033004 (  0.033006)
splat   0.050174   0.001397   0.051571 (  0.051584)

On Machine B (MRI 2.7.4):

          user     system      total        real
add     0.278377   0.000000   0.278377 (  0.278316)
splat   0.780735   0.043730   0.824465 (  0.824377)

How come splat-based array construction is so slow? I expect splat-based construction to be no slower than plain addition (after all the AST could even turn one into the other), and I actually expect it to be more efficient (since the language can see everything, so it can avoid the intermediate arrays created by binary addition, it can also anticipate the size of the final array and reserve the space upfront, etc.).

So how come the alternative, that go throw a method call (so, a priori, less optimizable by the interpreter), are faster than something in which everything is honestly exposed to the interpreter?

EDIT: more alternatives

require 'benchmark'

$array = (0...100).to_a

def add
  $array   $array   $array
end

def append
  res = $array.dup
  res.append(*$array)
  res.append(*$array)
  res
end

def concat2
  res = []
  res.concat($array)
  res.concat($array)
  res.concat($array)
  res
end

def concat3
  [].concat($array, $array, $array)
end

def concat_splat
  [].concat(*[$array, $array, $array])
end

def flatten
  [$array, $array, $array].flatten
end

def flatten_1
  [$array, $array, $array].flatten(1)
end

def splat
  [*$array, *$array, *$array]
end

n = 100_000
Benchmark.bm do |x|
  x.report('add         ') {n.times{add}}
  x.report('append      ') {n.times{append}}
  x.report('concat2     ') {n.times{concat2}}
  x.report('concat3     ') {n.times{concat3}}
  x.report('concat_splat') {n.times{concat_splat}}
  x.report('flatten     ') {n.times{flatten}}
  x.report('flatten(1)  ') {n.times{flatten_1}}
  x.report('splat       ') {n.times{splat}}
end

This is Machine A, MRI 3.1.3.

                user     system      total        real
add           0.032841   0.001502   0.034343 (  0.034347)
append        0.059024   0.009869   0.068893 (  0.068944)
concat2       0.047542   0.000144   0.047686 (  0.047690)
concat3       0.062913   0.010196   0.073109 (  0.073111)
concat_splat  0.056044   0.000748   0.056792 (  0.056796)
flatten       0.978091   0.005750   0.983841 (  0.983952)
flatten(1)    0.165467   0.000998   0.166465 (  0.166472)
splat         0.049761   0.000131   0.049892 (  0.049896)

CodePudding user response：

Addition and splat versions emit different bytecode (some output is omitted for brevity):

puts RubyVM::InstructionSequence.compile(<<~ADDITION).disasm
  src = (0...100).to_a
  res = src   src   src
ADDITION

0000 putobject                              0...100                   (   1)[Li]
0002 opt_send_without_block                 <calldata!mid:to_a, argc:0, ARGS_SIMPLE>
0004 setlocal_WC_0                          src@0
0006 getlocal_WC_0                          src@0                     (   2)[Li]
0008 getlocal_WC_0                          src@0
0010 opt_plus                               <calldata!mid: , argc:1, ARGS_SIMPLE>
0012 getlocal_WC_0                          src@0
0014 opt_plus                               <calldata!mid: , argc:1, ARGS_SIMPLE>
0016 dup
0017 setlocal_WC_0                          res@1
0019 leave

puts RubyVM::InstructionSequence.compile(<<~SPLATS).disasm
  src = (0...100).to_a
  res = [*src, *src, *src]
SPLATS

0000 putobject                              0...100                   (   1)[Li]
0002 opt_send_without_block                 <calldata!mid:to_a, argc:0, ARGS_SIMPLE>
0004 setlocal_WC_0                          src@0
0006 getlocal_WC_0                          src@0                     (   2)[Li]
0008 splatarray                             true
0010 getlocal_WC_0                          src@0
0012 concatarray
0013 getlocal_WC_0                          src@0
0015 concatarray
0016 dup
0017 setlocal_WC_0                          res@1
0019 leave

Two snippets above looks quite similar, the difference is 2 ops_plus instructions vs splatarray 2 concatarrays. But implementation-wise the difference become bigger.

The first one boils down to 2 rb_ary_plus, in a nutshell:

allocate memory for src src
copy src src to a new memory location
allocate memory for (src src) src
copy (src src) src to a new memory location

The latter seems to be more complex internally: splatarray boils down to rb_ary_dup (so we copy ary first), concatarray under the hood duplicates a target array too and then boils down to rb_ary_splice; the latter is a bit hairy, but I believe we go to this branch where we effectively double array capacity (which includes copying the 1st array) and then copy the 2nd array. I'm not 100% sure if I'm tracing this execution flow properly, but if I do it gives us:

duplicate src
duplicate asrc again (?)
double target capacity (includes copying)
copy the 2nd array to the space allocated above
duplicate (src src)
double (src src) capacity (includes copying (src src) elements to a new memory location)
copy 3rd array to the allocated space

These additional duplications could explain the difference (not to mention overall complexity of the latter meaning more conditionals checked etc).