$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: Joerg Walter (jhr.walter_at_[hidden])
Date: 2003-05-15 16:57:38
Hi Csaba,
you wrote:
> Once I played with comparing Intel Math kernel with some implementations I
> did.
> I found out that loop unrolling may double the speed.
> Still I could never quite get close to the speech of the Intel Math kernel
> (for large matrices), presumably due to insufficient caching.
> The above applies to row-major matrices.
> For column-major matrices loop unrolling achieved the same speed as the
> intel math kernel.
I've been playing with loop unrolling in the past, too (see
BOOST_UBLAS_USE_DUFF_DEVICE), but never found a satisfactory solution of
that performance problem. Compilers seem to be sufficiently stressed by the
templated code.
> Below is the code if anyone wants to give it a try.
> Maybe ublas could make use of some performance optimizations..
For small matrices I'm still waiting for the first (or next? ;-) compiler
to vectorize inlined template code (ICC the hottest candidate, never had a
chance to check KAI). For larger matrices I've been playing with some crude
high level optimizations, see
http://groups.yahoo.com/group/ublas-dev/message/461
I don't know, if they're really useful.
> (better it
> should be connected
> to some optimized blas implementations..?)
Yep. Either low level (using explicit bindings) or high level (using
specialized evaluators). Both discussed in the past and still undecided.
Thanks,
Joerg