Home > Software engineering >  Performance issue with for loop on the initial run on .NET 7
Performance issue with for loop on the initial run on .NET 7

Time:12-20

I'm working on a performance sensitive application and considering moving from .NET 6 to .NET 7.

During comparing these two versions I've found that .NET 7 is slower executing a for loop on the initial run.

Testing is done with two separate console applications with identical code, one on .NET 6 and the other on .NET 7, running in release mode, any CPU.

Test code:

using System.Diagnostics;

int size = 1000000;
Stopwatch sw = new();

//create array
float[] arr = new float[size];
for (int i = 0; i < size; i  )
    arr[i] = i;

Console.WriteLine(AppDomain.CurrentDomain.SetupInformation.TargetFrameworkName);

Console.WriteLine($"\nForLoop1");
ForLoop1();
ForLoop1();
ForLoop1();
ForLoop1();
ForLoop1();

Console.WriteLine($"\nForLoopArray");
ForLoopArray();
ForLoopArray();
ForLoopArray();
ForLoopArray();
ForLoopArray();

Console.WriteLine($"\nForLoop2");
ForLoop2();
ForLoop2();
ForLoop2();
ForLoop2();
ForLoop2();

void ForLoop1()
{
    sw.Restart();

    int sum = 0;
    for (int i = 0; i < size; i  )
        sum  ;

    sw.Stop();
    Console.WriteLine($"{sw.ElapsedTicks} ticks ({sum})");
}

void ForLoopArray()
{
    sw.Restart();

    float sum = 0f;
    for (int i = 0; i < size; i  )
        sum  = arr[i];

    sw.Stop();
    Console.WriteLine($"{sw.ElapsedTicks} ticks ({sum})");
}

void ForLoop2()
{
    sw.Restart();

    int sum = 0;
    for (int i = 0; i < size; i  )
        sum  ;

    sw.Stop();
    Console.WriteLine($"{sw.ElapsedTicks} ticks ({sum})");
}

The console output for the .NET 6 version:

.NETCoreApp,Version=v6.0

ForLoop1
2989 ticks (1000000)
2846 ticks (1000000)
2851 ticks (1000000)
3180 ticks (1000000)
2841 ticks (1000000)

ForLoopArray
8270 ticks (4.9994036E 11)
8443 ticks (4.9994036E 11)
8354 ticks (4.9994036E 11)
8952 ticks (4.9994036E 11)
8458 ticks (4.9994036E 11)

ForLoop2
2842 ticks (1000000)
2844 ticks (1000000)
3117 ticks (1000000)
2835 ticks (1000000)
2992 ticks (1000000)

And the .NET 7 version:

.NETCoreApp,Version=v7.0

ForLoop1
19658 ticks (1000000)
2921 ticks (1000000)
2967 ticks (1000000)
3190 ticks (1000000)
3722 ticks (1000000)

ForLoopArray
20041 ticks (4.9994036E 11)
8342 ticks (4.9994036E 11)
9212 ticks (4.9994036E 11)
8501 ticks (4.9994036E 11)
9726 ticks (4.9994036E 11)

ForLoop2
14016 ticks (1000000)
3008 ticks (1000000)
2885 ticks (1000000)
2882 ticks (1000000)
2888 ticks (1000000)

As you can see, the .NET 6 timings are very similar, whereas the .NET 7 timings show an initial high value (19658, 20041 and 14016).

Fiddling with the environment variables DOTNET_ReadyToRun and DOTNET_TieredPGO just makes things worse.

Why is this and how can it be rectified?

CodePudding user response:

My guess would be that this can be connected to the new On-Stack Replacement feature introduced in .NET 7. Enabling DOTNET_JitDisasmSummary "on my machine" (Windows Powershell - $env:DOTNET_JitDisasmSummary=1) results in the following output:

ForLoop1
   9: JIT compiled Program:<<Main>$>g__ForLoop1|0_0(byref) [Tier0, IL size=118, code size=291]
  10: JIT compiled Program:<<Main>$>g__ForLoop1|0_0(byref) [Tier1-OSR @0x19, IL size=118, code size=571]
13420 ticks (1000000)
2431 ticks (1000000)
...

ForLoopArray
  11: JIT compiled Program:<<Main>$>g__ForLoopArray|0_1(byref) [Tier0, IL size=129, code size=339]
  12: JIT compiled Program:<<Main>$>g__ForLoopArray|0_1(byref) [Tier1-OSR @0x24, IL size=129, code size=609]
  13: JIT compiled System.SpanHelpers:SequenceCompareTo(byref,int,byref,int) [Tier1, IL size=632, code size=329]
19380 ticks (4.9994036E 11)
10694 ticks (4.9994036E 11)
...

ForLoop2
  14: JIT compiled Program:<<Main>$>g__ForLoop2|0_2(byref) [Tier0, IL size=118, code size=291]
  15: JIT compiled Program:<<Main>$>g__ForLoop2|0_2(byref) [Tier1-OSR @0x19, IL size=118, code size=549]
11720 ticks (1000000)
2549 ticks (1000000)
...

Setting DOTNET_TC_QuickJitForLoops to 0 (env:DOTNET_TC_QuickJitForLoops=1) "reverts" this behaviour (not sure why cause docs state that default is false, maybe something was changed in .NET 7):

ForLoop1
   8: JIT compiled Program:<<Main>$>g__ForLoop1|0_0(byref) [Tier-0 switched to FullOpts, IL size=118, code size=577]
2590 ticks (1000000)
2535 ticks (1000000)
...

ForLoopArray
   9: JIT compiled Program:<<Main>$>g__ForLoopArray|0_1(byref) [Tier-0 switched to FullOpts, IL size=129, code size=618]
  10: JIT compiled System.SpanHelpers:SequenceCompareTo(byref,int,byref,int) [Tier1, IL size=632, code size=329]
10759 ticks (4.9994036E 11)
10816 ticks (4.9994036E 11)
...

ForLoop2
  11: JIT compiled Program:<<Main>$>g__ForLoop2|0_2(byref) [Tier-0 switched to FullOpts, IL size=118, code size=555]
2446 ticks (1000000)
2509 ticks (1000000)
...

Possibly related discussion on github

P.S.

If your code is performance-sensitive especially startup performance-sensitive possibly it is worth considering to look into Native AOT.

  • Related