$include_dir="/home/hyper-archives/ublas/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [ublas] Matrix multiplication performance
From: Michael Lehn (michael.lehn_at_[hidden])
Date: 2016-01-27 20:31:31
On 28 Jan 2016, at 01:04, Joaquim Duran Comas <jdurancomas_at_[hidden]> wrote:
> If explicit simd should not be used, by now, then you should help the compiler to generate more optimized code, by aligning properly the buffers.
> 
> There is the boost.align library, which provides an aligned allocator (http://www.boost.org/doc/libs/1_60_0/doc/html/align.html)
That good to know.  Functions aligned_alloc and aligned_free can replace the functions
void *
malloc_(std::size_t alignment, std::size_t size)
{
    alignment = std::max(alignment, alignof(void *));
    size     += alignment;
    void *ptr  = std::malloc(size);
    void *ptr2 = (void *)(((uintptr_t)ptr + alignment) & ~(alignment-1));
    void **vp  = (void**) ptr2 - 1;
    *vp        = ptr;
    return ptr2;
}
void
free_(void *ptr)
{
    std::free(*((void**)ptr-1));
}
This really should have gone into C++11