$include_dir="/home/hyper-archives/ublas/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [ublas] pointer to matrix structure
From: Marcel Rehberg (Marcel.Rehberg_at_[hidden])
Date: 2011-08-08 08:14:34
On Mon, 08 Aug 2011 11:41:31 +0200, Kraus Philipp  
<philipp.kraus_at_[hidden]> wrote:
>
> Am 08.08.2011 um 11:20 schrieb Marcel Rehberg:
>
>> Right, this would copy the data. What I meant was to use  
>> ublas::<T,ublas::column_major> for your original matrix, right from the  
>> beginning.
>>
>> It does not make a difference from the ublas interface point of view.  
>> So it wouldn't require any code changes except in the definition of the  
>> matrices*. To make things easier you could use a typedef like
>
> I think there is a difference: I read the matrix data often in the  
> row-order, a loop over the rows and a inner loop over the columns, so if  
> I use a column-major the loops must / should be changed. If I use a  
> column-major matrix and iterate first over the rows and with the inner  
> loop over the columns, it can be create a poor performance, so I would  
> like to have a row-major matrix.
Ah sorry, I wasn't even aware of the performance penalty* and I don't know  
a way to avoid it. My approach would be to choose the ordering based on  
what is the most time or memory consuming operation. In my case I use  
lapack quite allot and so I went with ublas::column_major just to avoid  
the copying. Sorry that I can't help more.
regards
Marcel
* I wrote a little test program:
#include <iostream>
#include "boost/numeric/ublas/matrix.hpp"
#include "boost/numeric/ublas/io.hpp"
#include "boost/progress.hpp"
int main() {
   namespace ublas = boost::numeric::ublas;
   typedef ublas::matrix<double, ublas::column_major> myMatrixColMaj;
   typedef ublas::matrix<double> myMatrixRowMaj;
   unsigned n_rows=5000;
   unsigned n_cols=5000;
   unsigned i,j;
   myMatrixColMaj M1(n_rows,n_cols);
   myMatrixRowMaj M2(n_rows,n_cols);
   // fill both matrices before testing, for more reliable timings.
   for (j=0; j<n_cols;j++) {
     for (i=0; i<n_rows;i++) {
       M1(i,j)=i+j+1;
       M2(i,j)=i+j+1;
     }
   }
   {
     boost::progress_timer t;
     for (i=0; i<n_rows;i++)
       for (j=0; j<n_cols;j++)
        M1(i,j)=i+j+1;
   }
   {
     boost::progress_timer t;
     for (j=0; j<n_cols;j++)
       for (i=0; i<n_rows;i++)
        M1(i,j)=i+j+1;
   }
   {
     boost::progress_timer t;
     for (i=0; i<n_rows;i++)
       for (j=0; j<n_cols;j++)
        M2(i,j)=i+j+1;
   }
   {
     boost::progress_timer t;
     for (j=0; j<n_cols;j++)
       for (i=0; i<n_rows;i++)
        M2(i,j)=i+j+1;
   }
   return 0;
}
Output on my box is:
0.30 s
0.08 s
0.08 s
0.30 s
So traversing the wrong order nearly takes four times as long as  
traversing the right order. (in all cases M1==M2).
Regards
Marcel