$include_dir="/home/hyper-archives/ublas/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [ublas] Matrix multiplication performance
From: Riccardo Rossi (rrossi_at_[hidden])
Date: 2016-01-28 15:15:12
i am impressed. 6* on a cuadcore!!
do you also do sparse linear algebra by chance?
cheers
Riccardo
On Thu, Jan 28, 2016 at 7:49 PM, Michael Lehn <michael.lehn_at_[hidden]>
wrote:
> In the meantime some results from my Haswell machine.  It has 4 quad
> cores.  But there are
> other jobs running so I went up to 8 threads.  But anyway, the
> parallelisation is simple for
> the maximal matrix dimension N=M=K=4000 it reaches
>
> 1) 32.9 GFLOPS with 1 thread
> 2) 63 GFLOPS with 2 threads
> 3) 104.6 GFLOPS with 4 threads
> 4) 180.5 GFLOPS with 8 threads
>
> that is ok for a simple implementation but can be done better.  Most of
> all it takes much too long (or
> much to big problem sizes to scale well).  But for the moment we should
> focus on a good single threaded
> implementation and do the parallel stuff the right way later.  As this
> will require more than just a single
> #pragma omp parallel for
>
>
>
> [lehn_at_node042 session4]$ g++ -Ofast  -Wall -std=c++11 -DNDEBUG -DHAVE_FMA
> -I ../boost_1_60_0/ -fopenmp matprod.cc
> [lehn_at_node042 session4]$ export OMP_NUM_THREADS=1; ./a.out
> #   m     n     k  uBLAS:   t1       MFLOPS   Blocked:   t2      MFLOPS
>     Diff nrm1
>   100   100   100   0.00119632      1671.79      0.00089036      2246.28
>    3.90562e-14
>   200   200   200   0.00322943      4954.44      0.00082579      19375.4
>    1.50135e-15
>   300   300   300    0.0108177      4991.81      0.00221283      24403.1
>    2.18434e-16
>   400   400   400    0.0247278      5176.35      0.00429661      29790.9
>    5.58593e-17
>   500   500   500     0.053677      4657.49      0.00822185      30406.8
>    1.94899e-17
>   600   600   600    0.0820133      5267.44       0.0136631      31617.9
>    1.16524e-17
>   700   700   700     0.129231      5308.34       0.0208619      32882.9
>    6.82385e-18
>   800   800   800      0.19206      5331.67       0.0309358      33100.8
>    4.08617e-18
>   900   900   900     0.272354      5353.34       0.0430091      33899.8
>    2.54117e-18
>  1000  1000  1000     0.372831      5364.36       0.0582482      34335.8
>    1.64011e-18
>  1100  1100  1100     0.494906       5378.8       0.0796676      33413.8
>    1.08587e-18
>  1200  1200  1200     0.642926      5375.43        0.098814      34974.8
>    7.43828e-19
>  1300  1300  1300     0.815164      5390.32        0.125541      35000.5
>    5.26152e-19
>  1400  1400  1400      1.04147      5269.48        0.154808      35450.4
>    3.81507e-19
>  1500  1500  1500      1.24516      5420.99        0.187327      36033.2
>    2.82388e-19
>  1600  1600  1600       1.5581      5257.68        0.236257        34674
>    2.12031e-19
>  1700  1700  1700      2.57574      3814.82        0.273446      35933.9
>    1.61384e-19
>  1800  1800  1800      3.24948       3589.5        0.319974        36453
>    1.25033e-19
>  1900  1900  1900      4.01719      3414.82        0.378235      36268.4
>    9.86666e-20
>  2000  2000  2000      4.82997      3312.65        0.438886        36456
>    7.86863e-20
>  2100  2100  2100      5.88206      3148.89        0.517821      35769.1
>    6.31726e-20
>  2200  2200  2200      6.87358      3098.24        0.590235      36080.6
>     5.1152e-20
>  2300  2300  2300      8.08021      3011.55        0.659934      36873.4
>    4.19219e-20
>  2400  2400  2400      9.31063      2969.51        0.748285      36948.5
>    3.46865e-20
>  2500  2500  2500      10.5343      2966.51         0.84448        37005
>    2.88942e-20
>  2600  2600  2600      11.8768      2959.71        0.984227      35715.3
>    2.42294e-20
>  2700  2700  2700      13.3378      2951.45         1.06838      36846.5
>    2.04036e-20
>  2800  2800  2800      14.9304      2940.57         1.18762      36968.2
>    1.73201e-20
>  2900  2900  2900      16.8965      2886.87         1.33445      36552.8
>    1.47904e-20
>  3000  3000  3000      18.7376       2881.9         1.49449      36132.7
>    1.27205e-20
>  3100  3100  3100      20.8439      2858.48         1.66163      35857.5
>    1.09759e-20
>  3200  3200  3200      22.9032      2861.44         1.82771      35856.9
>    9.49415e-21
>  3300  3300  3300      28.2407      2545.05         2.08438      34482.2
>    8.25868e-21
>  3400  3400  3400      27.5374       2854.6         2.18449      35984.7
>    7.22064e-21
>  3500  3500  3500       29.925       2865.5         2.34372      36587.1
>    6.34137e-21
>  3600  3600  3600      32.6588      2857.17         2.56586      36366.7
>     5.5874e-21
>  3700  3700  3700      34.5032      2936.14         2.77154      36552.2
>    4.92873e-21
>  3800  3800  3800      36.9099      2973.29         2.97732      36860.1
>    4.36811e-21
>  3900  3900  3900      44.6497      2657.09         3.24271      36586.1
>    3.88313e-21
>  4000  4000  4000      56.9767      2246.53         3.88046      32985.8
>    3.46672e-21
> [lehn_at_node042 session4]$ export OMP_NUM_THREADS=2; ./a.out
> #   m     n     k  uBLAS:   t1       MFLOPS   Blocked:   t2      MFLOPS
>     Diff nrm1
>   100   100   100   0.00120386      1661.33     0.000876976      2280.56
>    3.95867e-14
>   200   200   200   0.00323702      4942.82      0.00099518      16077.5
>    1.50256e-15
>   300   300   300    0.0106352       5077.5      0.00286667      18837.2
>    2.19644e-16
>   400   400   400    0.0247765      5166.19      0.00610925      20951.8
>    5.61969e-17
>   500   500   500    0.0478359       5226.2      0.00707235      35348.9
>    1.94268e-17
>   600   600   600     0.082058      5264.57       0.0108406      39850.3
>    1.16982e-17
>   700   700   700     0.129637      5291.71       0.0170924      40134.8
>     6.8281e-18
>   800   800   800       0.1925      5319.48       0.0214161      47814.4
>    4.09348e-18
>   900   900   900     0.273022      5340.22       0.0298684      48814.2
>    2.54562e-18
>  1000  1000  1000     0.373113       5360.3       0.0417747      47875.9
>    1.64027e-18
>  1100  1100  1100     0.499034       5334.3       0.0527302      50483.4
>    1.08356e-18
>  1200  1200  1200      0.64351      5370.55       0.0624654      55326.6
>    7.44302e-19
>  1300  1300  1300     0.829601      5296.52       0.0793488      55375.8
>    5.25547e-19
>  1400  1400  1400      1.13615      4830.35       0.0937135      58561.5
>     3.8117e-19
>  1500  1500  1500      1.38215      4883.71         0.11078      60931.4
>    2.82628e-19
>  1600  1600  1600      2.34569      3492.37        0.148535      55152.1
>    2.11636e-19
>  1700  1700  1700      2.80764      3499.73        0.166754      58925.2
>    1.61617e-19
>  1800  1800  1800      3.65597       3190.4        0.183227      63658.6
>    1.25225e-19
>  1900  1900  1900      6.04791      2268.22        0.229272      59832.8
>     9.8624e-20
>  2000  2000  2000      5.41562      2954.41        0.244907        65331
>     7.8709e-20
>  2100  2100  2100      5.79329      3197.15        0.320638      57766.1
>    6.31124e-20
>  2200  2200  2200      10.1105      2106.32        0.348126      61173.2
>    5.11424e-20
>  2300  2300  2300       11.746      2071.68        0.385373        63144
>    4.18844e-20
>  2400  2400  2400      13.4099      2061.77        0.438608      63035.8
>    3.46829e-20
>  2500  2500  2500      14.8645      2102.32        0.491434      63589.4
>    2.88839e-20
>  2600  2600  2600      17.1602      2048.46        0.550163      63893.8
>    2.42378e-20
>  2700  2700  2700        19.24      2046.05        0.616314      63873.3
>    2.03993e-20
>  2800  2800  2800      14.8633      2953.85        0.675975      64949.2
>    1.73082e-20
>  2900  2900  2900       18.533      2631.96         0.72636      67154.1
>    1.47984e-20
>  3000  3000  3000      18.2701      2955.64        0.804625        67112
>    1.27211e-20
>  3100  3100  3100      20.2371      2944.19        0.938507      63485.9
>    1.09831e-20
>  3200  3200  3200      22.6838      2889.11         1.07581      60918.1
>    9.49232e-21
>  3300  3300  3300      25.0228      2872.33         1.06473      67504.6
>    8.25942e-21
>  3400  3400  3400      27.3561      2873.51         1.16247      67621.6
>    7.21511e-21
>  3500  3500  3500      29.7889      2878.59         1.32098      64913.8
>     6.3385e-21
>  3600  3600  3600      34.8098      2680.62         1.37908      67662.7
>    5.58738e-21
>  3700  3700  3700      37.6151      2693.23         1.52253      66538.1
>    4.92976e-21
>  3800  3800  3800        38.99      2814.67         1.63282      67211.5
>    4.36537e-21
>  3900  3900  3900      57.5765      2060.53         1.75221      67707.4
>    3.88246e-21
>  4000  4000  4000      51.1335      2503.25         2.03062      63035.1
>    3.46549e-21
> [lehn_at_node042 session4]$ export OMP_NUM_THREADS=4; ./a.out
> #   m     n     k  uBLAS:   t1       MFLOPS   Blocked:   t2      MFLOPS
>     Diff nrm1
>   100   100   100   0.00119733      1670.39      0.00124331      1608.61
>    3.84618e-14
>   200   200   200   0.00427996      3738.35     0.000965206      16576.8
>    1.47604e-15
>   300   300   300    0.0146617      3683.06      0.00235442      22935.6
>    2.18643e-16
>   400   400   400    0.0301558      4244.62      0.00431311        29677
>    5.57089e-17
>   500   500   500    0.0509763      4904.24      0.00541684      46152.4
>    1.94817e-17
>   600   600   600    0.0823676      5244.78      0.00815973        52943
>    1.16851e-17
>   700   700   700     0.131064      5234.07       0.0133055      51557.7
>    6.81692e-18
>   800   800   800     0.198438       5160.3       0.0208701      49065.4
>    4.09087e-18
>   900   900   900     0.273346      5333.91       0.0244156        59716
>    2.53963e-18
>  1000  1000  1000     0.374021       5347.3       0.0252625      79168.7
>    1.64654e-18
>  1100  1100  1100     0.502426      5298.29         0.05022      53006.7
>    1.08395e-18
>  1200  1200  1200     0.865696      3992.16       0.0443738      77883.9
>    7.44661e-19
>  1300  1300  1300      1.00063      4391.23       0.0544683      80670.8
>    5.25559e-19
>  1400  1400  1400      1.26828      4327.13       0.0599685      91514.7
>    3.80933e-19
>  1500  1500  1500       1.3623      4954.86       0.0826977      81622.6
>     2.8281e-19
>  1600  1600  1600      2.14419      3820.56       0.0940622      87091.3
>    2.11718e-19
>  1700  1700  1700      2.98106      3296.14        0.104828      93734.3
>    1.61252e-19
>  1800  1800  1800      4.10679      2840.17        0.125856      92677.2
>    1.25247e-19
>  1900  1900  1900      7.25737      1890.22        0.137977      99422.2
>    9.85647e-20
>  2000  2000  2000       9.0378      1770.34        0.195959      81649.8
>    7.86877e-20
>  2100  2100  2100      7.43091      2492.56        0.205205        90261
>    6.31814e-20
>  2200  2200  2200      8.01552      2656.84        0.229878      92640.5
>    5.11206e-20
>  2300  2300  2300      11.3209      2149.47        0.242479       100355
>    4.19281e-20
>  2400  2400  2400      11.7655      2349.91        0.267819       103234
>     3.4696e-20
>  2500  2500  2500        14.75      2118.65        0.318302      98177.1
>    2.89065e-20
>  2600  2600  2600      16.1598      2175.27        0.349963       100445
>    2.42432e-20
>  2700  2700  2700      19.6465      2003.72        0.384713       102326
>    2.04284e-20
>  2800  2800  2800      18.5487      2366.95        0.422473       103922
>    1.73051e-20
>  2900  2900  2900      18.4844      2638.87        0.431616       113012
>    1.48037e-20
>  3000  3000  3000      18.3601      2941.16        0.487947       110668
>    1.27205e-20
>  3100  3100  3100      20.1449      2957.67        0.555138       107328
>    1.09745e-20
>  3200  3200  3200      22.2403      2946.72        0.597566       109672
>    9.49034e-21
>  3300  3300  3300      24.3526      2951.39        0.635492       113100
>    8.25459e-21
>  3400  3400  3400      26.5834      2957.04        0.693353       113374
>    7.22134e-21
>  3500  3500  3500      28.9996      2956.93        0.753307       113831
>    6.33808e-21
>  3600  3600  3600      31.4492      2967.07        0.793409       117609
>    5.58761e-21
>  3700  3700  3700      34.9533      2898.33        0.959263       105608
>    4.93129e-21
>  3800  3800  3800      38.2463       2869.4         1.01686       107924
>    4.36735e-21
>  3900  3900  3900      42.3957      2798.35         1.08582       109262
>    3.88282e-21
>  4000  4000  4000      44.7076      2863.05         1.22383       104590
>      3.469e-21
> [lehn_at_node042 session4]$ export OMP_NUM_THREADS=8; ./a.out
> #   m     n     k  uBLAS:   t1       MFLOPS   Blocked:   t2      MFLOPS
>     Diff nrm1
>   100   100   100   0.00120762      1656.15        0.001279      1563.72
>     3.8463e-14
>   200   200   200    0.0036143      4426.86     0.000631185      25349.1
>    1.48858e-15
>   300   300   300    0.0108139      4993.56      0.00204664      26384.7
>    2.20015e-16
>   400   400   400    0.0251417      5091.13      0.00316074      40496.9
>    5.58204e-17
>   500   500   500    0.0482996      5176.03      0.00479854      52099.2
>     1.9429e-17
>   600   600   600    0.0830052      5204.49       0.0074349      58104.3
>    1.16567e-17
>   700   700   700      0.13281      5165.28       0.0134778      50898.6
>    6.82167e-18
>   800   800   800      0.19639      5214.12       0.0143988      71117.2
>    4.08235e-18
>   900   900   900     0.279542      5215.68       0.0186552        78155
>    2.54218e-18
>  1000  1000  1000     0.381906      5236.89        0.020541      97366.2
>    1.63963e-18
>  1100  1100  1100     0.509376         5226       0.0338259      78697.1
>    1.08399e-18
>  1200  1200  1200     0.760565      4543.99       0.0317094       108990
>    7.44215e-19
>  1300  1300  1300      1.04442      4207.14       0.0419104       104843
>    5.25101e-19
>  1400  1400  1400      1.47537      3719.75       0.0450985       121689
>    3.81236e-19
>  1500  1500  1500      1.90994      3534.15       0.0514728       131137
>    2.82394e-19
>  1600  1600  1600      1.56705      5227.67       0.0599189       136718
>    2.11847e-19
>  1700  1700  1700      2.62892      3737.66       0.0756787       129838
>    1.61316e-19
>  1800  1800  1800      3.29831      3536.35       0.0827417       140969
>    1.25087e-19
>  1900  1900  1900      4.03473      3399.98       0.0915113       149905
>    9.85857e-20
>  2000  2000  2000      4.87315       3283.3        0.105251       152017
>    7.86417e-20
>  2100  2100  2100      5.87975      3150.13        0.123634       149813
>    6.31281e-20
>  2200  2200  2200      7.06021      3016.34        0.134536       158293
>    5.11845e-20
>  2300  2300  2300      10.6045      2294.69        0.162671       149590
>    4.19035e-20
>  2400  2400  2400      9.31785      2967.21        0.160164       172623
>    3.46453e-20
>  2500  2500  2500      10.4852      2980.38        0.181067       172588
>    2.89024e-20
>  2600  2600  2600      11.8263      2972.35        0.208792       168359
>    2.42313e-20
>  2700  2700  2700      13.2755      2965.32        0.226646       173690
>    2.04063e-20
>  2800  2800  2800      14.8042      2965.65         0.24966       175855
>    1.73142e-20
>  2900  2900  2900      16.9983      2869.58        0.287892       169432
>    1.47875e-20
>  3000  3000  3000      19.7129      2739.32        0.330801       163240
>    1.27204e-20
>  3100  3100  3100      21.4476      2778.02        0.382773       155659
>    1.09704e-20
>  3200  3200  3200      22.7482      2880.93        0.440904       148640
>    9.49823e-21
>  3300  3300  3300      25.1449      2858.39        0.416183       172698
>    8.25712e-21
>  3400  3400  3400      27.4412       2864.6        0.497616       157969
>     7.2164e-21
>  3500  3500  3500      30.5976      2802.51         0.49974       171589
>    6.33781e-21
>  3600  3600  3600      32.5102      2870.24        0.558002       167225
>    5.59021e-21
>  3700  3700  3700      35.3566      2865.26        0.571003       177418
>    4.93007e-21
>  3800  3800  3800       37.321      2940.55        0.578064       189847
>    4.36563e-21
>  3900  3900  3900      40.1645       2953.8        0.623894       190157
>     3.8876e-21
>  4000  4000  4000      43.1753      2964.66        0.709328       180452
>    3.46575e-21
>
>
> On 28 Jan 2016, at 18:41, Michael Lehn <michael.lehn_at_[hidden]> wrote:
>
> > Also the parallelisation with openmp is done pretty cheap and simple at
> the moment.  So you also
> > might want to check how it scales by
> >
> >  export OMP_NUM_THREADS=2; ./a.out
> >  export OMP_NUM_THREADS=4; ./a.out
> >  export OMP_NUM_THREADS=6; ./a.out
> > ...
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://listarchives.boost.org/mailman/listinfo.cgi/ublas
> Sent to: rrossi_at_[hidden]
>
-- *Riccardo Rossi* PhD, Civil Engineer member of the Kratos Team: www.cimne.com/kratos lecturer at Universitat Politècnica de Catalunya, BarcelonaTech (UPC) Research fellow at International Center for Numerical Methods in Engineering (CIMNE) C/ Gran Capità , s/n, Campus Nord UPC, Ed. C1, Despatx C9 08034 â Barcelona â Spain â www.cimne.com - T.(+34) 93 401 56 96 skype: *rougered4* <http://www.cimne.com/> <https://www.facebook.com/cimne> <http://blog.cimne.com/> <http://vimeo.com/cimne> <http://www.youtube.com/user/CIMNEvideos> <http://www.linkedin.com/company/cimne> <https://twitter.com/cimne> Les dades personals contingudes en aquest missatge són tractades amb la finalitat de mantenir el contacte professional entre CIMNE i voste. Podra exercir els drets d'accés, rectificació, cancel·lació i oposició, dirigint-se a cimne_at_cimne.upc.edu. La utilització de la seva adreça de correu electronic per part de CIMNE queda subjecte a les disposicions de la Llei 34/2002, de Serveis de la Societat de la Informació i el Comerç Electronic. Imprimiu aquest missatge, només si és estrictament necessari. <http://www.cimne.com/>