Home > Enterprise >  Why is clang unwilling or unable to eliminate duplicate loads here
Why is clang unwilling or unable to eliminate duplicate loads here

Time:05-30

Consider the following C program:

typedef struct { int x; } Foo;

void original(Foo***** xs, Foo* foo) {
    xs[0][1][2][3] = foo;
    xs[0][1][2][3]->x = 42;
}

As far as I understand, per the C standard Foo** cannot alias Foo* etc, as their types are not compatible. Compiling the program with clang 14.0 and -O3 however results in duplicate loads:

    mov     rax, qword ptr [rdi]
    mov     rax, qword ptr [rax   8]
    mov     rax, qword ptr [rax   16]
    mov     qword ptr [rax   24], rsi
    mov     rax, qword ptr [rdi]
    mov     rax, qword ptr [rax   8]
    mov     rax, qword ptr [rax   16]
    mov     rax, qword ptr [rax   24]
    mov     dword ptr [rax], 42
    ret

I would expect an optimising compiler to either:

(A) Assign to x on foo directly and assign foo to xs (in any order)
(B) Perform address calculations for xs once and use them for assigning foo and x.

Clang correctly compiles B:

void fixed(Foo***** xs, Foo* foo) {
    Foo** ix = &xs[0][1][2][3];
    *ix = foo;
    (*ix)->x = 42;
}

as follows: (actually turning it into A)

    mov     rax, qword ptr [rdi]
    mov     rax, qword ptr [rax   8]
    mov     rax, qword ptr [rax   16]
    mov     qword ptr [rax   24], rsi
    mov     dword ptr [rsi], 42
    ret

Interestingly gcc compiles both definitions into A. Why is clang unwilling or unable to optimise the address calculation in the original definition?

Compiler Explorer Playground

CodePudding user response:

This is a partial answer.

The loads are performed twice because the optimizer missed the optimization. It succeed to detect this specific case, but fail by reporting the following errors:

Missed - load of type ptr not eliminated in favor of load because it is clobbered by store
Missed - load of type ptr not eliminated because it is clobbered by store
Missed - load of type ptr not eliminated because it is clobbered by store
Missed - load of type ptr not eliminated because it is clobbered by store

You can see that by opening the "optimization output" window in Godbolt.

This optimization is performed by the Global Value Numbering (GVN) pass in LLVM and the specific error appears to be reported from the function reportMayClobberedLoad. The code states that the missed load-elimination is due to an intervening store (again). For more information, one certainly need to delve into the algorithm of this optimization pass. A good start seems to be the GVNPass::AnalyzeLoadAvailability function. Fortunately, the code is commented.

Note a simplified Foo** use-case is optimized and a simplified Foo*** use-case is not optimized by default, but using restrict fix the missed-optimization (it looks like the optimizer wrongly assumes that aliasing can be an issue here due to the store).

I am wondering if this could be due to the LLVM-IR which seems to make no distinction between a Foo** or Foo*** pointer types: they are apparently all considered as raw pointers. Thus, the store forwarding optimization could fail because of the store may impact any pointer of the chain and the optimizer cannot know which one due to the aliasing (itself due to the loss of pointer type). Here is the produced LLVM-IR code:

define dso_local void @original(ptr nocapture noundef readonly %0, ptr noundef %1) local_unnamed_addr #0 !dbg !9 {
  call void @llvm.dbg.value(metadata ptr %0, metadata !24, metadata !DIExpression()), !dbg !26
  call void @llvm.dbg.value(metadata ptr %1, metadata !25, metadata !DIExpression()), !dbg !26
  %3 = load ptr, ptr %0, align 8, !dbg !27, !tbaa !28
  %4 = getelementptr inbounds ptr, ptr %3, i64 1, !dbg !27
  %5 = load ptr, ptr %4, align 8, !dbg !27, !tbaa !28
  %6 = getelementptr inbounds ptr, ptr %5, i64 2, !dbg !27
  %7 = load ptr, ptr %6, align 8, !dbg !27, !tbaa !28
  %8 = getelementptr inbounds ptr, ptr %7, i64 3, !dbg !27
  store ptr %1, ptr %8, align 8, !dbg !32, !tbaa !28
  %9 = load ptr, ptr %0, align 8, !dbg !33, !tbaa !28
   = getelementptr inbounds ptr, ptr %9, i64 1, !dbg !33
   = load ptr, ptr , align 8, !dbg !33, !tbaa !28
   = getelementptr inbounds ptr, ptr , i64 2, !dbg !33
   = load ptr, ptr , align 8, !dbg !33, !tbaa !28
   = getelementptr inbounds ptr, ptr , i64 3, !dbg !33
   = load ptr, ptr , align 8, !dbg !33, !tbaa !28
  store i32 42, ptr , align 4, !dbg !34, !tbaa !35
  ret void, !dbg !38
}

CodePudding user response:

The answer seems to be it's an open LLVM issue: [TBAA] Emit distinct TBAA tags for pointers with different depths,types.

Jérôme's answer tipped me off that this might have something to do with Type Based Alias Analysis (TBAA) when I noticed all loads use the same TBAA metadata.

Right now clang only emits* the following TBAA:

; Descriptors
!15 = !{!"Simple C/C   TBAA"}
!14 = !{!"omnipotent char", !15, i64 0}
!13 = !{!"any pointer", !14, i64 0}
!21 = !{!"int", !14, i64 0}
!20 = !{!"", !21, i64 0}
; Tags
!12 = !{!13, !13, i64 0}
!19 = !{!20, !21, i64 0}

Looking at the LLVM revision I figured eventually clang might be able to emit something along the lines of:

; Type descriptors
!0 = !{!"TBAA Root"}
!1 = !{!"omnipotent char", !0, i64 0}
!3 = !{!"int", !0, i64 0}
!2 = !{!"any pointer", !1, i64 0}
!11 = !{!"p1 foo", !2, i64 0} ; Foo*
!12 = !{!"p2 foo", !2, i64 0} ; Foo**
!13 = !{!"p3 foo", !2, i64 0} ; Foo***
!14 = !{!"p4 foo", !2, i64 0} ; Foo****
!10 = !{!"foo", !3, i64 0} ; struct {int x}

; Access tags
!20 = !{!14, !14, i64 0} ; Foo****
!21 = !{!13, !13, i64 0} ; Foo***
!22 = !{!12, !12, i64 0} ; Foo**
!23 = !{!11, !11, i64 0} ; Foo*
!24 = !{!10, !3, i64 0}  ; Foo.x

(I'm still not sure I fully grok the TBAA metadata format so please excuse any mistakes)

Together with the code below LLVM produces the expected assembly.

define void @original(ptr %0, ptr %1) {
  %3 = load ptr, ptr %0, !tbaa !20
  %4 = getelementptr ptr, ptr %3, i64 1
  %5 = load ptr, ptr %4, !tbaa !21
  %6 = getelementptr ptr, ptr %5, i64 2
  %7 = load ptr, ptr %6, !tbaa !22
  %8 = getelementptr ptr, ptr %7, i64 3
  store ptr %1, ptr %8, !tbaa !23

  %9 = load ptr, ptr %0, !tbaa !20
   = getelementptr ptr, ptr %9, i64 1
   = load ptr, ptr , !tbaa !21
   = getelementptr ptr, ptr , i64 2
   = load ptr, ptr , !tbaa !22
   = getelementptr ptr, ptr , i64 3
   = load ptr, ptr , !tbaa !23 ; : Foo*
  store i32 42, ptr , !tbaa !24

  ret void
}

Compiler Explorer Playground

* Compiler's Explorer LLVM IR view filters these out by default but you can see them by using -emit-llvm and disabling "Directives" filtering

  • Related