$include_dir="/home/hyper-archives/ublas/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [ublas] Status of development /Benchmarks
From: oswin krause (oswin.krause_at_[hidden])
Date: 2013-12-08 17:12:51
Hi,
On 08.12.2013 22:04, Petros wrote:
>
> - traits classes that would provide for whether the matrix is dense, 
> banded etc, row/column major, what is the LAPACK leading dimension
This is already there.
Given a  Matrix M, we have the following:
M::storage_category is
dense_tag if the matrix has a dense storage format
sparse_tag if the matrix has one of the many sparse formats
packed_tag if the matrix has packed (packed, triangular) storage
unknown_tag everything else (i.e. A+B has unknown_tag even if A and B 
are dense)
there are a few more tags regarding proxies, but these are the most 
important ones.
M::orientation_category
either row_major_tag or column_major_tag
if the storage_category is  not unknown, this returns the orientation of 
storage in memory
The leading dimension is a bit tricky but there are functions in 
detail/raw.hpp
But maybe you should take a look at the numeric bindings package. I 
think they have complte lapack bindings and the traits already implemented.
> etc.
> Thak you for your consideration,
> Petros P. Mamales
> *From:* oswin krause <mailto:oswin.krause_at_[hidden]>
> *Sent:* Sunday, December 08, 2013 4:18 AM
> *To:* ublas_at_[hidden] <mailto:ublas_at_[hidden]>
> *Subject:* Re: [ublas] Status of development /Benchmarks
> Hi,
>
> I agree with you and I think a rewrite wouldn't change the 
> "high-level" syntax. I like the way expressions are formed right now. 
> There is no real way to get rid of noalias as there is no real cheap 
> solution for alias-tests. And honestly i like prod() over op* because 
> prod() scrams in your face "this is going to take a while".
>>
>> for this outer parallelization to be useful, one needs guarantees 
>> that openmp is not used in the inner loop... otherwise problems may 
>> arise with nested parallelism.
> I don't see the issue here. OpenMP has in most configurations a fixed 
> size thread pool and would never spawn more threads than allowed. 
> Further it is allowed for an OpenMP thread to spawn more threads.
>
> The real issue is, OpenMP does not and will never support C++. There 
> is OpenMP support in gcc but in fact there is no guarantee that this 
> works as the C++11 memory model is not compatible with the OpenMP 
> model. AFAIK there are no plans for a C++11 compatible OpenMP 
> standard. Therefore we can't expect that there ever is OpenMP support 
> for clang or other compilers and I really think that uBLAS would need 
> to use proper threading models, for example using asio thread_pool
>
>
>
> On 07.12.2013 23:35, Riccardo Rossi wrote:
>> Dear Oswin,
>>
>> my first point is really trivial. i do not doubt that the current 
>> expression template mechanism is not optimal.
>>
>> the only thing i propose is to mantain a small backward compatibility 
>> layer so that code that works with the old ublas is not unnecessarily 
>> broken.
>>
>> for example if you assume that the novel syntax is
>>
>> C += A*B
>>
>> i believe that it could be useful to define some wraper between the 
>> old and new syntax. the easiest way (dirty) would be to define two macros
>>
>> "noalias" and                  " prod"
>>
>> so that
>> noalias(C) += prod(A,B)
>>
>> expands to the new format... indeed using macros is not a good idea 
>> but it gives the point
>>
>>
>>
>>
>>
>>
>> my second point is mostly about  matrices of size say 30*30, whose 
>> size is NOT known at compile time.
>> for such matrix size openmp is not useful, however one may perform in 
>> parallel a lot of operations on different  small matrices, 
>> parallelizing some sort of outer iteration.
>> for this outer parallelization to be useful, one needs guarantees 
>> that openmp is not used in the inner loop... otherwise problems may 
>> arise with nested parallelism.
>> ...i only implied this...
>>
>>
>> ciao
>> Riccardo
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sat, Dec 7, 2013 at 10:10 PM, oswin krause 
>> <oswin.krause_at_[hidden] 
>> <mailto:oswin.krause_at_[hidden]>> wrote:
>>
>>     Hi,
>>
>>     I like your ideas. But i am not sure what you mean with 1. and 2.
>>     The problem I see is, that uBLAS right now doesn't offer the
>>     desired quality and speed for single core, let alone multi core.
>>     Therefore I will focus on the internal infrastructure instead of
>>     motivating new algorithms.
>>
>>     short description:
>>     1. Rock solid implementation of the BLAS algorithms( "compute
>>     kernels")
>>     2. Rewriting of the expression mechanic.
>>     2.1 automatic use of the compute kernels for the operations
>>     2.2 automatic optimisation of expressions
>>
>>
>>     long description
>>     1. Rock solid implementation of the BLAS algorithms: matrix x
>>     vector and matrix x matrix in all useful combinations
>>     (sparse/sparse dense/dense dense/sparse ...
>>     triangular/dense...etc there are a lot), but also: matrix
>>     assignment. uBLAS will not be compared in #Features but in
>>     runtime compared to Armadillo/Eigen/Atlas/GotoBlas. Having a fast
>>     implementation of this is also crucial for multi-core and
>>     distributed computations. Also all higher level decompositions
>>     (Cholesky, LU, SVD, RankDecomp,...) and triangular solvers rely
>>     on fast BLAS.
>>
>>     2. Rewriting of the expression mechanic. The way uBLAS is
>>     implemented right now, writing
>>
>>     noalias(A)+=prod(B,C)+D
>>     or
>>     vec = prod(prod<matrix>(B,C),vec2);
>>
>>     is in most cases a bad decision. The second expression wouldn't
>>     even compile without the template parameter.
>>     What I would like to have is the following:
>>
>>     2.1 automatic use of the fast algorithms. That is the above
>>     expression should be translated to
>>     axpy_prod(B,C,A,false);
>>     noalias(A) +=D;
>>
>>     right now the matrix_assign algorithms are used which do a
>>     terrible job on prod(B,C).
>>
>>     2.2 automatic optimization of expressions. The second expression
>>     above is really inefficient. First the matrix-matrix prod is
>>     evaluated and the result is afterwards used for a single
>>     matrix-vector prod.
>>
>>     now compare to this:
>>     prod(B,prod<vector>(C,d));
>>
>>     only 2 matrix-vector products are performed which might save 99%
>>     of the FLOPS of the first expression. So it would be better if
>>     this would be transformed automatically. Normally the user could
>>     do that himself but sometimes it is not possible. Consider a
>>     Conjugate Gradient Solver:
>>
>>     x=CG(A,b);
>>
>>     a usual pattern of A is A=XX^T+c*Identity. CG uses internally
>>     only matrix-vector products - and possibly not many of them.
>>     Therefore you don't want to compute A because it is a) slow b)
>>     might require more memory than is available. Without expression
>>     optimizations it would be impossible to write a good CG that
>>     could do that.
>>
>>     Greetings,
>>     Oswin
>>
>>
>>
>>
>>     On 07.12.2013 11:38, David Bellot wrote:
>>>     Hi,
>>>
>>>     it has been a long discussion we all had for many months now.
>>>     Should we rewrite ublas from scratch or simply improve it.
>>>     Joaquim and Oswin wants a brand new ublas
>>>     Nasos was more in favor of improving it.
>>>
>>>     I personally find very exciting the idea of writing something
>>>     new, but ublas is very well known now. On the other hand, Eigen
>>>     and Armadillo took the crown of the main C++ blas library in
>>>     users' hearts.
>>>
>>>     On my todo list for ublas, there are things that will require
>>>     ublas to be deeply changed. At this stage, we can almost talk
>>>     about a new library.
>>>
>>>     Christmas is very close now, so maybe it's a good time to start
>>>     talking about the features we wish for ublas and see if they can
>>>     be implemented with the current version or if a new uBLAS 2.0 is
>>>     necessary.
>>>     After all, Boost::signal did the same a few years ago. We can
>>>     definitively do the transition.
>>>
>>>
>>>     I begin:
>>>
>>>     - unified representation of vectors and matrices to represent
>>>     the fact that a vector IS a matrix. Matlab does the same
>>>     - automated use of different algorithm to let the compiler
>>>     "chooses" the best implementation (if possible) and switch on
>>>     SSE, distributed or whateve we need
>>>     - implementation of solvers and decompositions algorithms
>>>
>>>     and this is what Nasos and I think should be integrated too:
>>>     1. Matrix multiplication
>>>     2. Algorithms infrastructure (so that we can have real useful
>>>     features)
>>>     3. Matrix/vector views for interoperability <- I think this is
>>>     ultra critical because now ublas is monolithic in the sense that
>>>     you have to use it everywhere you manipulate data. This would
>>>     really help into letting people for example have a list of
>>>     vectors (they are plotting) and ublas working on top of that to
>>>     do for example transformations
>>>     4. NEW DOCUMENTATION - examples and the rest
>>>     5. Incorporate some critical bindings (i.e. mumps bindings which
>>>     is currently probably the most efficient smp and distributed
>>>     open source linalg solver)
>>>     6. matlab binding?
>>>     7. distributed ublas
>>>
>>>
>>>     Please add and ESPECIALLY, please tell me your view on the
>>>     current infrastructure of uBLAS. It seems many people are not
>>>     happy with the current "expression template" grammar.
>>>
>>>     I'm open to everything
>>>
>>>     Best,
>>>     David
>>>
>>>
>>>     On Thu, Dec 5, 2013 at 11:14 AM, Joaquim Duran
>>>     <jduran.gm_at_[hidden] <mailto:jduran.gm_at_[hidden]>> wrote:
>>>
>>>         I think that al stuff pending of merge listed by David,
>>>         should be merged and migrate to uBlas 2.0 and while uBlas
>>>         2.0 is in development/maintenance then design from scratch
>>>         uBlas 3.0.
>>>
>>>
>>>
>>>
>>>
>>>     _______________________________________________
>>>     ublas mailing list
>>>     ublas_at_[hidden]  <mailto:ublas_at_[hidden]>
>>>     http://listarchives.boost.org/mailman/listinfo.cgi/ublas
>>>     Sent to:Oswin.Krause_at_[hidden]  <mailto:Oswin.Krause_at_[hidden]>
>>
>>
>>     _______________________________________________
>>     ublas mailing list
>>     ublas_at_[hidden] <mailto:ublas_at_[hidden]>
>>     http://listarchives.boost.org/mailman/listinfo.cgi/ublas
>>     Sent to: rrossi_at_[hidden] <mailto:rrossi_at_[hidden]>
>>
>>
>>
>>
>> -- 
>>
>> Dr. Riccardo Rossi, Civil Engineer
>>
>> Member of Kratos Team
>>
>> International Center for Numerical Methods in Engineering - CIMNE
>> Campus Norte, Edificio C1
>>
>> c/ Gran Capitán s/n
>>
>> 08034 Barcelona, España
>>
>> Tel: (+34) 93 401 56 96
>>
>> Fax: (+34) 93.401.6517
>>
>> web:www.cimne.com <http://www.cimne.com/>
>>
>>
>> _______________________________________________
>> ublas mailing list
>> ublas_at_[hidden]
>> http://listarchives.boost.org/mailman/listinfo.cgi/ublas
>> Sent to:Oswin.Krause_at_[hidden]
>
> ------------------------------------------------------------------------
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://listarchives.boost.org/mailman/listinfo.cgi/ublas
> Sent to: pmamales_at_[hidden]
>
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://listarchives.boost.org/mailman/listinfo.cgi/ublas
> Sent to: Oswin.Krause_at_[hidden]