Why fill array in while loop is so slow in Scala 3?-CodePudding

I have 2 implementation of filling Array[Int], there are described below. First one executes in 84 ms, and second 100 times slow:

Execute 'filling via Array.fill' in 84 ms
Execute 'filling via while' in 8334 ms

Why 2nd variant takes 100x more time? It is not a GC, because I can remove first execution with the same time for second. I run it on Java 11, with use Scala 3:

.jdks/adopt-openjdk-11.0.11/bin/java ... Fill

More than that, if you open Array.fill implementation, you will see implementation via while...

object Fill extends App {
  def timeMeasure[R](name: String)(block: => R): R = {
    val startTime = System.currentTimeMillis()
    val r = block
    val endTime = System.currentTimeMillis()
    println(s"Execute '$name' in ${endTime - startTime} ms")
    r
  }

  val n = 1000 * 1000 // * 10
  var ar = timeMeasure("filling via Array.fill") {
    Array.fill[Int](n)(10)
  }
  ar = timeMeasure("filling via while") {
    val array = new Array[Int](n)
    var i = 0
    while (i < n) {
      array(i) = i
      i  = 1
    }
    array
  }
}

PS: I repeat this test on Scala 2.12:

Execute 'filling via Array.fill' in 118 ms
Execute 'filling via while' in 6 ms

So problem in Scala 3...

PPS: In this case for (i <- 0 until n) works with normal speed, timing as for Scala 2.12. But there are cases where for works about 2 or 3 times slower compare to while.

PPPS: To whom who think that it is random or something, it is not: 100 tests performs for 516 seconds (today my PC something faster), so average time is so big. But anyway, in some programs some code blocks executes only ones, so you should not to perform mean time for ANY performance test.

UPDATE: I found that when val n located outside of code block, execution time about 1000x slower. But I don't understand why.

CodePudding user response：

Scala 3 is doing something very strange with this code.

If I refactor this into a different structure without changing the logic, then everything works as expected (Array.fill is a bit slower than while ).

object Fill extends App {
  def timeMeasure[R](f: => R): (java.lang.Long, R) = {
    val startTime = System.nanoTime()
    val r = f
    val endTime = System.nanoTime()
    (endTime - startTime, r)
  }

  def warmup(): Unit =  {
    println("== warmup start =====================")
    for (i <- 0 to 10) {
      measureForN(1000000)
    }
    println("== warmup finish =====================")
  }

  def measureForN(n: Int): Unit = {
    val t1 = timeMeasure { Array.fill[Int](n)(10) }

    val t2 = timeMeasure({
      val array = new Array[Int](n)
      var i = 0
      while (i < n) {
        array(i) = 10
        i  = 1
      }
      array
    })

    val t3 = timeMeasure({
      val array = new Array[Int](n)
      var i = 0
      while (i < n) {
        array(i) = i
        i  = 1
      }
      array
    })

    // just to ensure actual array creations
    val length = List(t1._2.length, t2._2.length, t3._2.length).min

    println(s"n: ${n}, length: ${length}, fill: ${t1._1 / 1000} μs , while constant: ${t2._1 / 1000} μs, while changing: ${t3._1 / 1000} μs")
  }

  warmup()

  measureForN(10)
  measureForN(100)
  measureForN(1000)
  measureForN(10000)
  measureForN(100000)
  measureForN(1000000)
  measureForN(10000000)
  measureForN(100000000)
}

Output:

== warmup start =====================
n: 1000000, length: 1000000, fill: 23533 μs , while constant: 3804 μs, while changing: 3716 μs
n: 1000000, length: 1000000, fill: 7070 μs , while constant: 1606 μs, while changing: 1783 μs
n: 1000000, length: 1000000, fill: 3911 μs , while constant: 1497 μs, while changing: 1689 μs
n: 1000000, length: 1000000, fill: 3821 μs , while constant: 1543 μs, while changing: 1718 μs
n: 1000000, length: 1000000, fill: 3798 μs , while constant: 1510 μs, while changing: 1662 μs
n: 1000000, length: 1000000, fill: 3801 μs , while constant: 1524 μs, while changing: 1796 μs
n: 1000000, length: 1000000, fill: 3896 μs , while constant: 1541 μs, while changing: 1703 μs
n: 1000000, length: 1000000, fill: 3805 μs , while constant: 1486 μs, while changing: 1687 μs
n: 1000000, length: 1000000, fill: 3854 μs , while constant: 1606 μs, while changing: 1712 μs
n: 1000000, length: 1000000, fill: 3836 μs , while constant: 1509 μs, while changing: 1698 μs
n: 1000000, length: 1000000, fill: 3846 μs , while constant: 1553 μs, while changing: 1672 μs
== warmup finish =====================
n: 10, length: 10, fill: 3 μs , while constant: 0 μs, while changing: 0 μs
n: 100, length: 100, fill: 2 μs , while constant: 3 μs, while changing: 0 μs
n: 1000, length: 1000, fill: 6 μs , while constant: 1 μs, while changing: 2 μs
n: 10000, length: 10000, fill: 41 μs , while constant: 19 μs, while changing: 17 μs
n: 100000, length: 100000, fill: 378 μs , while constant: 156 μs, while changing: 170 μs
n: 1000000, length: 1000000, fill: 3764 μs , while constant: 1464 μs, while changing: 1676 μs
n: 10000000, length: 10000000, fill: 36976 μs , while constant: 15687 μs, while changing: 10860 μs
n: 100000000, length: 100000000, fill: 312242 μs , while constant: 190274 μs, while changing: 221980 μs

Edit :: The change required is as simple as not directly using the n in the blocks.

object Fill extends App {

  def timeMeasure[R](name: String)(block: => R): R = {
    val startTime = System.currentTimeMillis()
    val r = block
    val endTime = System.currentTimeMillis()
    println(s"Execute '$name' in ${endTime - startTime} ms")
    r
  }

  val n = 1000 * 1000 // * 10

  def measureFill(x: Int): Unit = {
    val ar1 = timeMeasure("filling via Array.fill") {
      Array.fill[Int](x)(10)
    }
  }

  def measureWhile(x: Int): Unit = {
    val ar2 = timeMeasure("filling via while") {
      val array = new Array[Int](x)
      var i = 0
      while (i < x) {
        array(i) = i
        i  = 1
      }
      array
    }
  }

  println("== warmup ==================")
  measureFill(n)
  measureWhile(n)
  println("== warmup ==================")

  measureFill(n)
  measureWhile(n)

}

Output :

== warmup start ==================
Execute 'filling via Array.fill' in 26 ms
Execute 'filling via while' in 5 ms
== warmup finish ==================
Execute 'filling via Array.fill' in 6 ms
Execute 'filling via while' in 1 ms

Edit 2 :: As pointed out by Mikhail, this is being caused because the usage of n is being comipled to usage of method n() in generated Java code.

object Test3 {

  val n = 1

  val k = n

}

which is being compiled to,

//decompiled from Test3.class
public final class Test3 {
   public static int k() {
      return Test3$.MODULE$.k();
   }

   public static int n() {
      return Test3$.MODULE$.n();
   }
}

        //decompiled from Test3$.class
import java.io.Serializable;
import scala.runtime.ModuleSerializationProxy;

public final class Test3$ implements Serializable {
   private static final int n = 1;
   private static final int k;
   public static final Test3$ MODULE$ = new Test3$();

   private Test3$() {
   }

   static {
      k = MODULE$.n();
   }

   private Object writeReplace() {
      return new ModuleSerializationProxy(Test3$.class);
   }

   public int n() {
      return n;
   }

   public int k() {
      return k;
   }
}

But, the Java code generated by Scala 2.13.6 is almost the same (only that ScalaSignature part is different).

//decompiled from Test3.class
import scala.reflect.ScalaSignature;

@ScalaSignature(
   bytes = "\u0006\u0005\r:Qa\u0002\u0005\t\u0002=1Q!\u0005\u0005\t\u0002IAQ!G\u0001\u0005\u0002iAqaG\u0001C\u0002\u0013\u0005A\u0004\u0003\u0004!\u0003\u0001\u0006I!\b\u0005\bC\u0005\u0011\r\u0011\"\u0001\u001d\u0011\u0019\u0011\u0013\u0001)A\u0005;\u0005)A Z:ug)\u0011\u0011BC\u0001\ta\u0016\u0014X.\u00192be*\u00111\u0002D\u0001\u0005g\u0016\u0014\u0018NC\u0001\u000e\u0003\tiWm\u0001\u0001\u0011\u0005A\tQ\"\u0001\u0005\u0003\u000bQ 7\u000f^\u001a\u0014\u0005\u0005\u0019\u0002C\u0001\u000b\u0018\u001b\u0005)\"\"\u0001\f\u0002\u000bM\u001c\u0017\r\\1\n\u0005a)\"AB!osJ g-\u0001\u0004=S:LGO\u0010\u000b\u0002\u001f\u0005\ta.F\u0001\u001e!\t!b$\u0003\u0002  \t\u0019\u0011J\u001c;\u0002\u00059\u0004\u0013!A6\u0002\u0005-\u0004\u0003"
)
public final class Test3 {
   public static int k() {
      return Test3$.MODULE$.k();
   }

   public static int n() {
      return Test3$.MODULE$.n();
   }
}

        //decompiled from Test3$.class
public final class Test3$ {
   public static final Test3$ MODULE$ = new Test3$();
   private static final int n = 1;
   private static final int k;

   static {
      k = MODULE$.n();
   }

   public int n() {
      return n;
   }

   public int k() {
      return k;
   }

   private Test3$() {
   }
}

Which means that this is caused by some other bug (which is causing this n() to have such performance impact) in Scala 3.

CodePudding user response：

There is one notable difference between the implementations, and I suspect that the JIT compiler is exploiting the difference:

Array.fill is filling the array with a constant value
The while implementation is filling the array with a different value every time and the value depends on the iteration step

Even though Array.fill takes a by-name argument (thus the caller has to wrap it in a lambda), the JIT compiler can fairly easily deduce that the result of the lambda is constant and then use tricks to get the desired result (one that comes readily to mind is using memory pages which are zeroed out and then using SIMD instructions to add 10 to every location in parallel).

The SIMD approach doesn't work for setting array(i) = i: the loop actually has to get executed a million times (and the body of the loop isn't that amenable to optimization).

Note that System.currentTimeMillis is unreliable for single-shot benchmarking (for instance, it's allowed to move backwards (e.g. to correct for wall-clock drift)), though it's doubtful that it accounts for much of the difference (80ms vs. 800ms could be attributable to this, but 80ms vs. 8s seems less plausible).