$include_dir="/home/hyper-archives/ublas/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [ublas] Matrix multiplication performance
From: Michael Lehn (michael.lehn_at_[hidden])
Date: 2016-01-28 15:47:35
On 28 Jan 2016, at 21:15, Riccardo Rossi <rrossi_at_[hidden]> wrote:
> i am impressed. 6* on a cuadcore!! 
> 
Thanks, but actually two quad cores ;-)
And with more then 6 threads it requires a more fine gained method to scale well.  You have to consider
groups-hierarchies of threads.  E.g. one group is responsible of packing a block and afterwards multiplying
it multithreaded.  At the moment its like one group with to many members.
> do you also do sparse linear algebra by chance?
Sorry, not directly.  I just looked at libraries like SuperLU and Umfpack.  However, not as close as to other BLAS libraries.  But
from my impression this also could be done much more elegant in C++.  The big headache in these libraries is that they basically
have the same code for float, double, complex<float> and complex<double> .  Just using C++ as "C plus function templates would
make it much easier.  And the performance relevant part in these libraries is again a fast dense BLAS.
Cheers,
Michael