For some reason the aligned memory allocator supplied by VS 2015 is really slow.

See the following results:

graph

SSE::alignedMalloc is our implementation, which works in the standard way - it allocates requested size + alignment, then moves up the returned pointer to so that it is aligned, while storing the original pointer so it can be freed later.

_mm_malloc is the VS function that returns aligned memory. It compiles to a call to the same function as _aligned_malloc.

Finally malloc is of course the standard, boring memory allocation function.

As you can see, for some weird reason, VS's _mm_malloc is much slower than malloc and our aligned alloc implementation (SSE::alignedMalloc).

Issue filed with MS here.