$include_dir="/home/hyper-archives/ublas/include"; include("$include_dir/msg-header.inc") ?>
From: Sourabh (sourabh_at_[hidden])
Date: 2007-03-03 01:52:27
On Fri, 2 Mar 2007, Gunter Winkler wrote:
> On Friday 02 March 2007 13:21, Sourabh wrote:
> > Hi,
> > Can anyone please suggest me some optimizations in the loop to improve
> > the overall speed of the following program ?
> >
> > The steps I am doing are:
> > 1. Solve for Y the equation: (I-X) Y = B, where X is a lower triangular
> > sparse matrix.
> >
> > 2. Multiply Y with lower triangular sparse matrix Z, A = Y*Z.
> > 3. Add A to another vector.
> >
> > The sizes in the above steps are of order of 100.
> 
> Why don't you use dense matrices and fast BLAS (see 
> http://math-atlas.sourceforge.net/)? This is usually extremly fast for small 
> matrices (read: smaller than the processor cache)
> 
> Without any knowledge of the structure of your matrices, no-one can make any 
> suggestions. The general answer is: Try to find an ordering of rows/columns 
> such that X and Y are band matrices and use banded BLAS. 
> 
Hi Gunter, 
Thanks for your kind reply.
The structure of my matrices are as follows :
X is a lower triangular matrix of size (100 x 100) (in addtion, all 
diagonal elements are also 
zero), with at most 3 or 4 non-zero entries per column.
B is a dense vector of 100.
Y is to be solved (size obviously 100)
> > The above mentioned sequence is repeated by 2e9 times. So it is taking too
> > much time.
> 
> Precompute as much as possible outside your loop. If necessary split the loop 
> and try to use large dense matrices and fast BLAS. (No, ublas is not fast - 
> ublas is _reliable_)
> 
What is your opinion, is construction of these matrices take much of the 
time ? 
> > I am sure if I optimize the inner-most loop, it can be made fast.
> > What optimizations can be made in the loop ?
> 
> > int main (int argc, char* argv[])
> > {
> >     unsigned int samples    = atoi (argv[1]);
> >     unsigned int numpaths   = atoi (argv[2]);
> >     unsigned short gatesperpath = atoi (argv[3]);
> >
> >     for (unsigned int i = 0; i < samples; ++i) {
> > 	for (unsigned int j = 0; j < numpaths ; ++ j) {
> 
> > 	    mapped_matrix<double> 	lembda (gatesperpath, gatesperpath);
> > 	    identity_matrix<double> ident (gatesperpath);
> 
> are these matrices constant and independent of i and j? If so, initialize them 
> before the loop and use a compressed_matrix.
The matrix ident is obvioulsy const. I can move it above. However, 
the matrix lembda is different for each j. However, since I compute delays 
based on current value of lembda, I don't need previous values of lembda.
So, I can modify already constructed lembda in each j. Which approach is 
more efficient :
1. Create lembda each time in the innermost loop
2. Create lembda once and modify it in each innermost loop.
(lembda does not vary with i).
> 
> > 	    slopes = solve (lembda, omegas, lower_tag ());
> 
> You mean unit_lower_tag(): Solve (I+lembda) x = omegas?
I need to solve  the equation    (I-lembda) slopes = omegas
> 
> > 	    vector <double> delays (gatesperpath);
> 
> move the construction out off the loop
> 
Same question as above. The delays vector creation or assignment to delays 
in each loop?
> > 	    delays = prod (lembda, slopes);
> > 	    delays += omegas;
> 
> try: noalias( delays ) = omegas;
>      axpy_prod( lambda, slopes, delays, false ); // from operation.hpp
> 
> > 	}
> >     }
> >
> > }
> 
> mfg
> Gunter
> 
Thank you very much for your help.
-- -- Sourabh