The weblog of Nicholas Chapman Reassociation optimisations in Visual Studio 2012Posted 30 Dec 2014 Multiplication in C++ is left to right associative - e.g. y = a * b * c means y = (a * b) * c Note that although the mathematical operation of multiplication is associative, e.g. (a * b) * c = a * (b * c), the same is not true for floating point multiplication, due to intermediate rounding error. Therefore in default circumstances, the compiler may not optimise y = a * b * c to y = a * (b * c) as it may give different results. However, if 'fast math' mode is enabled, compilers may in theory perform this 'reassociation'. This can be important for a number of reasons - one is that b and c may be constants, and the y = a * (b * c) may be inside a loop. Performing reassociation means that the b*c multiplication may be 'hoisted' outside the loop, so only one multiplication is done instead of two. Clang performs this reassociation optimisation:  float f() { const float b = 2.f; const float c = 3.f; float sum = 1.f; for(int i=0; i<1024; ++i) { const float a = (float)(sum + 1.f); const float y = a * b * c; sum += y; } return sum; }  gives: .LCPI0_0: .long 1065353216 # float 1 .LCPI0_1: .long 1086324736 # float 6 f(): # @f() movss .LCPI0_0(%rip), %xmm1 movl \$1024, %eax # imm = 0x400 movss .LCPI0_1(%rip), %xmm2 movaps %xmm1, %xmm0 .LBB0_1: # =>This Inner Loop Header: Depth=1 movaps %xmm0, %xmm3 addss %xmm1, %xmm0 mulss %xmm2, %xmm0 addss %xmm3, %xmm0 decl %eax jne .LBB0_1 ret  (See the program on http://gcc.godbolt.org/) Note that there is only one multiplication (the mulss instruction) inside the loop. Also b * c is precomputed. (note the 'float 6' constant). Visual Studio 2012, however, gives: static float func() { 00000001408750A0 sub rsp,18h const float b = 2.f; const float c = 3.f; float sum = 1.f; 00000001408750A4 movss xmm4,dword ptr [__real@3f800000 (0140D9C67Ch)] 00000001408750AC movss xmm5,dword ptr [__real@40000000 (0140D9CBF8h)] 00000001408750B4 movaps xmmword ptr [rsp],xmm6 00000001408750B8 mov eax,80h 00000001408750BD movaps xmm3,xmm4 00000001408750C0 movss xmm6,dword ptr [__real@40400000 (0140D9CD58h)] 00000001408750C8 nop dword ptr [rax+rax] for(int i=0; i<1024; ++i) { const float a = (float)(sum + 1.f); 00000001408750D0 movaps xmm0,xmm3 00000001408750D3 addss xmm0,xmm4 const float y = a * b * c; 00000001408750D7 mulss xmm0,xmm5 00000001408750DB mulss xmm0,xmm6 sum += y; 00000001408750DF addss xmm3,xmm0 00000001408750E3 movaps xmm1,xmm3 00000001408750E6 addss xmm1,xmm4 00000001408750EA mulss xmm1,xmm5 sum += y; 00000001408750EE mulss xmm1,xmm6 00000001408750F2 addss xmm3,xmm1 00000001408750F6 movaps xmm0,xmm3 00000001408750F9 addss xmm0,xmm4 00000001408750FD mulss xmm0,xmm5 0000000140875101 mulss xmm0,xmm6 0000000140875105 addss xmm3,xmm0 0000000140875109 movaps xmm1,xmm3 000000014087510C addss xmm1,xmm4 0000000140875110 mulss xmm1,xmm5 0000000140875114 mulss xmm1,xmm6 0000000140875118 addss xmm3,xmm1 000000014087511C movaps xmm0,xmm3 000000014087511F addss xmm0,xmm4 0000000140875123 mulss xmm0,xmm5 0000000140875127 mulss xmm0,xmm6 000000014087512B addss xmm3,xmm0 000000014087512F movaps xmm1,xmm3 0000000140875132 addss xmm1,xmm4 0000000140875136 mulss xmm1,xmm5 000000014087513A mulss xmm1,xmm6 000000014087513E addss xmm3,xmm1 0000000140875142 movaps xmm2,xmm3 0000000140875145 addss xmm2,xmm4 0000000140875149 mulss xmm2,xmm5 000000014087514D mulss xmm2,xmm6 0000000140875151 addss xmm3,xmm2 0000000140875155 movaps xmm1,xmm3 0000000140875158 addss xmm1,xmm4 000000014087515C mulss xmm1,xmm5 0000000140875160 mulss xmm1,xmm6 0000000140875164 addss xmm3,xmm1 0000000140875168 dec rax 000000014087516B jne func+30h (01408750D0h) } return sum; 0000000140875171 movaps xmm0,xmm3 } 0000000140875174 movaps xmm6,xmmword ptr [rsp] 0000000140875178 add rsp,18h 000000014087517C ret  Although VS has unrolled the loop a bit, you can see that there are two multiplications done, with a dependency between them, e.g. the first has to finish before the second is done. So VS doesn't seem to do the reassociation optimisation, even with fast maths enabled (/fp:fast), which is unfortunate. EDIT: Bug posted here: https://connect.microsoft.com/VisualStudio/feedback/details/1075102. Also optimisation still not done in VS 2013.Do you have a comment or feedback about this blog post? Please email me.< Back All content by Nicholas Chapman.