$include_dir="/home/hyper-archives/ublas/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [ublas] Matrix multiplication performance
From: Oswin Krause (Oswin.Krause_at_[hidden])
Date: 2016-01-29 02:50:44
Hi,
I would like to contribute with some Benchmarks as well. Is the code 
available for testing?
Best,
Oswin
On 2016-01-28 19:49, Michael Lehn wrote:
> In the meantime some results from my Haswell machine.  It has 4 quad
> cores.  But there are
> other jobs running so I went up to 8 threads.  But anyway, the
> parallelisation is simple for
> the maximal matrix dimension N=M=K=4000 it reaches
> 
> 1) 32.9 GFLOPS with 1 thread
> 2) 63 GFLOPS with 2 threads
> 3) 104.6 GFLOPS with 4 threads
> 4) 180.5 GFLOPS with 8 threads
> 
> that is ok for a simple implementation but can be done better.  Most
> of all it takes much too long (or
> much to big problem sizes to scale well).  But for the moment we
> should focus on a good single threaded
> implementation and do the parallel stuff the right way later.  As this
> will require more than just a single
> #pragma omp parallel for
> 
> 
> 
> [lehn_at_node042 session4]$ g++ -Ofast  -Wall -std=c++11 -DNDEBUG
> -DHAVE_FMA -I ../boost_1_60_0/ -fopenmp matprod.cc
> [lehn_at_node042 session4]$ export OMP_NUM_THREADS=1; ./a.out
> #   m     n     k  uBLAS:   t1       MFLOPS   Blocked:   t2
> MFLOPS        Diff nrm1
>   100   100   100   0.00119632      1671.79      0.00089036
> 2246.28     3.90562e-14
>   200   200   200   0.00322943      4954.44      0.00082579
> 19375.4     1.50135e-15
>   300   300   300    0.0108177      4991.81      0.00221283
> 24403.1     2.18434e-16
>   400   400   400    0.0247278      5176.35      0.00429661
> 29790.9     5.58593e-17
>   500   500   500     0.053677      4657.49      0.00822185
> 30406.8     1.94899e-17
>   600   600   600    0.0820133      5267.44       0.0136631
> 31617.9     1.16524e-17
>   700   700   700     0.129231      5308.34       0.0208619
> 32882.9     6.82385e-18
>   800   800   800      0.19206      5331.67       0.0309358
> 33100.8     4.08617e-18
>   900   900   900     0.272354      5353.34       0.0430091
> 33899.8     2.54117e-18
>  1000  1000  1000     0.372831      5364.36       0.0582482
> 34335.8     1.64011e-18
>  1100  1100  1100     0.494906       5378.8       0.0796676
> 33413.8     1.08587e-18
>  1200  1200  1200     0.642926      5375.43        0.098814
> 34974.8     7.43828e-19
>  1300  1300  1300     0.815164      5390.32        0.125541
> 35000.5     5.26152e-19
>  1400  1400  1400      1.04147      5269.48        0.154808
> 35450.4     3.81507e-19
>  1500  1500  1500      1.24516      5420.99        0.187327
> 36033.2     2.82388e-19
>  1600  1600  1600       1.5581      5257.68        0.236257
> 34674     2.12031e-19
>  1700  1700  1700      2.57574      3814.82        0.273446
> 35933.9     1.61384e-19
>  1800  1800  1800      3.24948       3589.5        0.319974
> 36453     1.25033e-19
>  1900  1900  1900      4.01719      3414.82        0.378235
> 36268.4     9.86666e-20
>  2000  2000  2000      4.82997      3312.65        0.438886
> 36456     7.86863e-20
>  2100  2100  2100      5.88206      3148.89        0.517821
> 35769.1     6.31726e-20
>  2200  2200  2200      6.87358      3098.24        0.590235
> 36080.6      5.1152e-20
>  2300  2300  2300      8.08021      3011.55        0.659934
> 36873.4     4.19219e-20
>  2400  2400  2400      9.31063      2969.51        0.748285
> 36948.5     3.46865e-20
>  2500  2500  2500      10.5343      2966.51         0.84448
> 37005     2.88942e-20
>  2600  2600  2600      11.8768      2959.71        0.984227
> 35715.3     2.42294e-20
>  2700  2700  2700      13.3378      2951.45         1.06838
> 36846.5     2.04036e-20
>  2800  2800  2800      14.9304      2940.57         1.18762
> 36968.2     1.73201e-20
>  2900  2900  2900      16.8965      2886.87         1.33445
> 36552.8     1.47904e-20
>  3000  3000  3000      18.7376       2881.9         1.49449
> 36132.7     1.27205e-20
>  3100  3100  3100      20.8439      2858.48         1.66163
> 35857.5     1.09759e-20
>  3200  3200  3200      22.9032      2861.44         1.82771
> 35856.9     9.49415e-21
>  3300  3300  3300      28.2407      2545.05         2.08438
> 34482.2     8.25868e-21
>  3400  3400  3400      27.5374       2854.6         2.18449
> 35984.7     7.22064e-21
>  3500  3500  3500       29.925       2865.5         2.34372
> 36587.1     6.34137e-21
>  3600  3600  3600      32.6588      2857.17         2.56586
> 36366.7      5.5874e-21
>  3700  3700  3700      34.5032      2936.14         2.77154
> 36552.2     4.92873e-21
>  3800  3800  3800      36.9099      2973.29         2.97732
> 36860.1     4.36811e-21
>  3900  3900  3900      44.6497      2657.09         3.24271
> 36586.1     3.88313e-21
>  4000  4000  4000      56.9767      2246.53         3.88046
> 32985.8     3.46672e-21
> [lehn_at_node042 session4]$ export OMP_NUM_THREADS=2; ./a.out
> #   m     n     k  uBLAS:   t1       MFLOPS   Blocked:   t2
> MFLOPS        Diff nrm1
>   100   100   100   0.00120386      1661.33     0.000876976
> 2280.56     3.95867e-14
>   200   200   200   0.00323702      4942.82      0.00099518
> 16077.5     1.50256e-15
>   300   300   300    0.0106352       5077.5      0.00286667
> 18837.2     2.19644e-16
>   400   400   400    0.0247765      5166.19      0.00610925
> 20951.8     5.61969e-17
>   500   500   500    0.0478359       5226.2      0.00707235
> 35348.9     1.94268e-17
>   600   600   600     0.082058      5264.57       0.0108406
> 39850.3     1.16982e-17
>   700   700   700     0.129637      5291.71       0.0170924
> 40134.8      6.8281e-18
>   800   800   800       0.1925      5319.48       0.0214161
> 47814.4     4.09348e-18
>   900   900   900     0.273022      5340.22       0.0298684
> 48814.2     2.54562e-18
>  1000  1000  1000     0.373113       5360.3       0.0417747
> 47875.9     1.64027e-18
>  1100  1100  1100     0.499034       5334.3       0.0527302
> 50483.4     1.08356e-18
>  1200  1200  1200      0.64351      5370.55       0.0624654
> 55326.6     7.44302e-19
>  1300  1300  1300     0.829601      5296.52       0.0793488
> 55375.8     5.25547e-19
>  1400  1400  1400      1.13615      4830.35       0.0937135
> 58561.5      3.8117e-19
>  1500  1500  1500      1.38215      4883.71         0.11078
> 60931.4     2.82628e-19
>  1600  1600  1600      2.34569      3492.37        0.148535
> 55152.1     2.11636e-19
>  1700  1700  1700      2.80764      3499.73        0.166754
> 58925.2     1.61617e-19
>  1800  1800  1800      3.65597       3190.4        0.183227
> 63658.6     1.25225e-19
>  1900  1900  1900      6.04791      2268.22        0.229272
> 59832.8      9.8624e-20
>  2000  2000  2000      5.41562      2954.41        0.244907
> 65331      7.8709e-20
>  2100  2100  2100      5.79329      3197.15        0.320638
> 57766.1     6.31124e-20
>  2200  2200  2200      10.1105      2106.32        0.348126
> 61173.2     5.11424e-20
>  2300  2300  2300       11.746      2071.68        0.385373
> 63144     4.18844e-20
>  2400  2400  2400      13.4099      2061.77        0.438608
> 63035.8     3.46829e-20
>  2500  2500  2500      14.8645      2102.32        0.491434
> 63589.4     2.88839e-20
>  2600  2600  2600      17.1602      2048.46        0.550163
> 63893.8     2.42378e-20
>  2700  2700  2700        19.24      2046.05        0.616314
> 63873.3     2.03993e-20
>  2800  2800  2800      14.8633      2953.85        0.675975
> 64949.2     1.73082e-20
>  2900  2900  2900       18.533      2631.96         0.72636
> 67154.1     1.47984e-20
>  3000  3000  3000      18.2701      2955.64        0.804625
> 67112     1.27211e-20
>  3100  3100  3100      20.2371      2944.19        0.938507
> 63485.9     1.09831e-20
>  3200  3200  3200      22.6838      2889.11         1.07581
> 60918.1     9.49232e-21
>  3300  3300  3300      25.0228      2872.33         1.06473
> 67504.6     8.25942e-21
>  3400  3400  3400      27.3561      2873.51         1.16247
> 67621.6     7.21511e-21
>  3500  3500  3500      29.7889      2878.59         1.32098
> 64913.8      6.3385e-21
>  3600  3600  3600      34.8098      2680.62         1.37908
> 67662.7     5.58738e-21
>  3700  3700  3700      37.6151      2693.23         1.52253
> 66538.1     4.92976e-21
>  3800  3800  3800        38.99      2814.67         1.63282
> 67211.5     4.36537e-21
>  3900  3900  3900      57.5765      2060.53         1.75221
> 67707.4     3.88246e-21
>  4000  4000  4000      51.1335      2503.25         2.03062
> 63035.1     3.46549e-21
> [lehn_at_node042 session4]$ export OMP_NUM_THREADS=4; ./a.out
> #   m     n     k  uBLAS:   t1       MFLOPS   Blocked:   t2
> MFLOPS        Diff nrm1
>   100   100   100   0.00119733      1670.39      0.00124331
> 1608.61     3.84618e-14
>   200   200   200   0.00427996      3738.35     0.000965206
> 16576.8     1.47604e-15
>   300   300   300    0.0146617      3683.06      0.00235442
> 22935.6     2.18643e-16
>   400   400   400    0.0301558      4244.62      0.00431311
> 29677     5.57089e-17
>   500   500   500    0.0509763      4904.24      0.00541684
> 46152.4     1.94817e-17
>   600   600   600    0.0823676      5244.78      0.00815973
> 52943     1.16851e-17
>   700   700   700     0.131064      5234.07       0.0133055
> 51557.7     6.81692e-18
>   800   800   800     0.198438       5160.3       0.0208701
> 49065.4     4.09087e-18
>   900   900   900     0.273346      5333.91       0.0244156
> 59716     2.53963e-18
>  1000  1000  1000     0.374021       5347.3       0.0252625
> 79168.7     1.64654e-18
>  1100  1100  1100     0.502426      5298.29         0.05022
> 53006.7     1.08395e-18
>  1200  1200  1200     0.865696      3992.16       0.0443738
> 77883.9     7.44661e-19
>  1300  1300  1300      1.00063      4391.23       0.0544683
> 80670.8     5.25559e-19
>  1400  1400  1400      1.26828      4327.13       0.0599685
> 91514.7     3.80933e-19
>  1500  1500  1500       1.3623      4954.86       0.0826977
> 81622.6      2.8281e-19
>  1600  1600  1600      2.14419      3820.56       0.0940622
> 87091.3     2.11718e-19
>  1700  1700  1700      2.98106      3296.14        0.104828
> 93734.3     1.61252e-19
>  1800  1800  1800      4.10679      2840.17        0.125856
> 92677.2     1.25247e-19
>  1900  1900  1900      7.25737      1890.22        0.137977
> 99422.2     9.85647e-20
>  2000  2000  2000       9.0378      1770.34        0.195959
> 81649.8     7.86877e-20
>  2100  2100  2100      7.43091      2492.56        0.205205
> 90261     6.31814e-20
>  2200  2200  2200      8.01552      2656.84        0.229878
> 92640.5     5.11206e-20
>  2300  2300  2300      11.3209      2149.47        0.242479
> 100355     4.19281e-20
>  2400  2400  2400      11.7655      2349.91        0.267819
> 103234      3.4696e-20
>  2500  2500  2500        14.75      2118.65        0.318302
> 98177.1     2.89065e-20
>  2600  2600  2600      16.1598      2175.27        0.349963
> 100445     2.42432e-20
>  2700  2700  2700      19.6465      2003.72        0.384713
> 102326     2.04284e-20
>  2800  2800  2800      18.5487      2366.95        0.422473
> 103922     1.73051e-20
>  2900  2900  2900      18.4844      2638.87        0.431616
> 113012     1.48037e-20
>  3000  3000  3000      18.3601      2941.16        0.487947
> 110668     1.27205e-20
>  3100  3100  3100      20.1449      2957.67        0.555138
> 107328     1.09745e-20
>  3200  3200  3200      22.2403      2946.72        0.597566
> 109672     9.49034e-21
>  3300  3300  3300      24.3526      2951.39        0.635492
> 113100     8.25459e-21
>  3400  3400  3400      26.5834      2957.04        0.693353
> 113374     7.22134e-21
>  3500  3500  3500      28.9996      2956.93        0.753307
> 113831     6.33808e-21
>  3600  3600  3600      31.4492      2967.07        0.793409
> 117609     5.58761e-21
>  3700  3700  3700      34.9533      2898.33        0.959263
> 105608     4.93129e-21
>  3800  3800  3800      38.2463       2869.4         1.01686
> 107924     4.36735e-21
>  3900  3900  3900      42.3957      2798.35         1.08582
> 109262     3.88282e-21
>  4000  4000  4000      44.7076      2863.05         1.22383
> 104590       3.469e-21
> [lehn_at_node042 session4]$ export OMP_NUM_THREADS=8; ./a.out
> #   m     n     k  uBLAS:   t1       MFLOPS   Blocked:   t2
> MFLOPS        Diff nrm1
>   100   100   100   0.00120762      1656.15        0.001279
> 1563.72      3.8463e-14
>   200   200   200    0.0036143      4426.86     0.000631185
> 25349.1     1.48858e-15
>   300   300   300    0.0108139      4993.56      0.00204664
> 26384.7     2.20015e-16
>   400   400   400    0.0251417      5091.13      0.00316074
> 40496.9     5.58204e-17
>   500   500   500    0.0482996      5176.03      0.00479854
> 52099.2      1.9429e-17
>   600   600   600    0.0830052      5204.49       0.0074349
> 58104.3     1.16567e-17
>   700   700   700      0.13281      5165.28       0.0134778
> 50898.6     6.82167e-18
>   800   800   800      0.19639      5214.12       0.0143988
> 71117.2     4.08235e-18
>   900   900   900     0.279542      5215.68       0.0186552
> 78155     2.54218e-18
>  1000  1000  1000     0.381906      5236.89        0.020541
> 97366.2     1.63963e-18
>  1100  1100  1100     0.509376         5226       0.0338259
> 78697.1     1.08399e-18
>  1200  1200  1200     0.760565      4543.99       0.0317094
> 108990     7.44215e-19
>  1300  1300  1300      1.04442      4207.14       0.0419104
> 104843     5.25101e-19
>  1400  1400  1400      1.47537      3719.75       0.0450985
> 121689     3.81236e-19
>  1500  1500  1500      1.90994      3534.15       0.0514728
> 131137     2.82394e-19
>  1600  1600  1600      1.56705      5227.67       0.0599189
> 136718     2.11847e-19
>  1700  1700  1700      2.62892      3737.66       0.0756787
> 129838     1.61316e-19
>  1800  1800  1800      3.29831      3536.35       0.0827417
> 140969     1.25087e-19
>  1900  1900  1900      4.03473      3399.98       0.0915113
> 149905     9.85857e-20
>  2000  2000  2000      4.87315       3283.3        0.105251
> 152017     7.86417e-20
>  2100  2100  2100      5.87975      3150.13        0.123634
> 149813     6.31281e-20
>  2200  2200  2200      7.06021      3016.34        0.134536
> 158293     5.11845e-20
>  2300  2300  2300      10.6045      2294.69        0.162671
> 149590     4.19035e-20
>  2400  2400  2400      9.31785      2967.21        0.160164
> 172623     3.46453e-20
>  2500  2500  2500      10.4852      2980.38        0.181067
> 172588     2.89024e-20
>  2600  2600  2600      11.8263      2972.35        0.208792
> 168359     2.42313e-20
>  2700  2700  2700      13.2755      2965.32        0.226646
> 173690     2.04063e-20
>  2800  2800  2800      14.8042      2965.65         0.24966
> 175855     1.73142e-20
>  2900  2900  2900      16.9983      2869.58        0.287892
> 169432     1.47875e-20
>  3000  3000  3000      19.7129      2739.32        0.330801
> 163240     1.27204e-20
>  3100  3100  3100      21.4476      2778.02        0.382773
> 155659     1.09704e-20
>  3200  3200  3200      22.7482      2880.93        0.440904
> 148640     9.49823e-21
>  3300  3300  3300      25.1449      2858.39        0.416183
> 172698     8.25712e-21
>  3400  3400  3400      27.4412       2864.6        0.497616
> 157969      7.2164e-21
>  3500  3500  3500      30.5976      2802.51         0.49974
> 171589     6.33781e-21
>  3600  3600  3600      32.5102      2870.24        0.558002
> 167225     5.59021e-21
>  3700  3700  3700      35.3566      2865.26        0.571003
> 177418     4.93007e-21
>  3800  3800  3800       37.321      2940.55        0.578064
> 189847     4.36563e-21
>  3900  3900  3900      40.1645       2953.8        0.623894
> 190157      3.8876e-21
>  4000  4000  4000      43.1753      2964.66        0.709328
> 180452     3.46575e-21
> 
> 
> On 28 Jan 2016, at 18:41, Michael Lehn <michael.lehn_at_[hidden]> wrote:
> 
>> Also the parallelisation with openmp is done pretty cheap and simple 
>> at the moment.  So you also
>> might want to check how it scales by
>> 
>>  export OMP_NUM_THREADS=2; ./a.out
>>  export OMP_NUM_THREADS=4; ./a.out
>>  export OMP_NUM_THREADS=6; ./a.out
>> ...
> 
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://listarchives.boost.org/mailman/listinfo.cgi/ublas
> Sent to: Oswin.Krause_at_[hidden]