Void fun () {
Int size_=20000;
Clock_t start and end;
Int nc=0, na=0;
Float * sumCPP=new float [size_];
//sumCPP (na) *=sumCPP (na) using movups commands
Start=clock ();
For (nc=0; Nc & lt; Size_; Nc++)
For (na=0; Na & lt; Size_; Na++)
SumCPP (na) *=sumCPP (na).//movups
End=clock ();
Printf (" % d, % d, \ n ",
Nc,//if for nc here, above sumCPP (na) *=sumCPP [na] spend time 52, using movups commands
(the end - start));
//sumCPP (na) *=sumCPP (na) using mulss commands
Start=clock ();
For (nc=0; Nc & lt; Size_; Nc++)
For (na=0; Na & lt; Size_; Na++)
SumCPP (na) *=sumCPP (na).//mulss
End=clock ();
Printf (" % d, % d, \ n ",
If 0,//here is 0, the above sumCPP (na) *=sumCPP [na] spend time 713, using mulss commands
(the end - start));
The delete [] sumCPP;
}
CodePudding user response:
The timing of code loop hundreds of times moreExecute a program a few times more
CodePudding user response:
Is not the problem, the compiled code will change,
Fun ();
01117 cf1 push 13880 h
01117 cf6 call dword PTR [__imp_operator new [] (011272 c8h)]
Fun ();
01117 CFC add esp, 4
01117 CFF mov esi, eax
01117 d01 mov dword PTR fa88h] [ebp - 1, esi
01117 d07 call dword PTR [__imp__clock a8h (011272)]
01117 d0d lea edi, [esi + 20 h]
01117 d10 mov dword PTR [ebp - 14 h], eax
01117 d13 mov esi, 4 e20h
01117 d18 nop dword PTR [eax + eax]
01117 d20 mov ecx, edi
01117 d22 mov edx, 4 e2h
01117 d27 nop word PTR [eax + eax]
01117 d30 lea ecx, [ecx + 40 h]
01117 d33 movups xmm0, xmmword PTR [ecx - 60 h]//the first sumCPP (na) *=sumCPP (na). Using movups
01117 d37 mulps xmm0, xmm0
01117 d3a movups xmmword PTR [ecx - 60 h], xmm0
01117 d3e movups xmm0, xmmword PTR [ecx - 50 h]
01117 d42 mulps xmm0, xmm0
01117 d45 movups xmmword PTR [ecx - 50 h], xmm0
01117 d49 movups xmm0, xmmword PTR [ecx - 40 h]
01117 d4d mulps xmm0, xmm0
01117 d50 movups xmmword PTR [ecx - 40 h], xmm0
01117 d54 movups xmm0, xmmword PTR [ecx - 30 h]
01117 d58 mulps xmm0, xmm0
01117 d5b movups xmmword PTR [ecx - 30 h], xmm0
1
01117 d5f sub edx,01117 d62 developed main + 70 h (01117 d30h)
01117 d64 sub esi, 1
01117 distinguished district developed main + 60 h (01117 d20h)
01117 d69 mov edi, dword PTR [__imp__clock a8h (011272)]
01117 d6f call edi
01117 d71 sub eax, dword PTR [ebp - 14 h]
01117 d74 push eax
01117 d75 push 4 e20h
01117 d7a push offset string "% d, % d, \ n" (0112 a5a8h)
01117 d7f call printf (010 d1c60h)
01117 d84 add esp, 0 ch
01117 d87 call edi
01117 d89 mov ecx, dword PTR fa88h] [ebp - 1
01117 d8f mov esi, 4 e20h
01117 d94 mov dword PTR [ebp - 14 h], eax
01117 d97 nop word PTR [eax + eax]
01117 da0 mov edx, 7 d0h
01117 da5 nop word PTR [eax + eax]
01117 db0 movss xmm0, dword PTR (ecx)
01117 db3 mulss xmm0 xmm0//second sumCPP (na) *=sumCPP (na). Using mulss
01117 db8 mulss xmm0, xmm0
01117 DBC mulss xmm0, xmm0
01117 dc0 mulss xmm0, xmm0
01117 dc4 mulss xmm0, xmm0
01117 dc8 mulss xmm0, xmm0
01117 DCC mulss xmm0, xmm0
01117 dd0 mulss xmm0, xmm0
01117 dd4 mulss xmm0, xmm0
01117 dd8 mulss xmm0, xmm0
01117 DDC movss dword PTR [ecx], xmm0
1
01117 de0 sub edx,01117 de3 developed main + 0 f0h db0h (01117)
01117 de5 add ecx, 4
01117 de8 sub esi, 1
01117 deb developed main + 0 e0h da0h (01117)
01117 ded call edi
01117 def sub eax, dword PTR [ebp - 14 h]
01117 df2 push eax
01117 df3 push esi
01117 df4 axle push offset string "% d, % d, \ n" (0112 a5a8h)
01117 df9 call printf (010 d1c60h)
01117 dfe push dword PTR fa88h] [ebp - 1
01117 e04 call dword PTR [__imp_operator delete [] (CCH) 011272]
01117 e0a add esp, 10 h
CodePudding user response:
The printf statements commented out, and then the corresponding assembly instruction,CodePudding user response:
The optimization of vc + + is a bit strange, other compiler generates code both difference is not so big, or the first fast or the second fast