$include_dir="/home/hyper-archives/ublas/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [ublas] [PATCH 3/3] boost::ublas increasing the range of BLAS level 3 benchmarks
From: Nasos Iliopoulos (nasos_i_at_[hidden])
Date: 2016-03-14 11:12:34
Only in exceptional cases we  make pull requests or changes in the 
master . Master only merges off develop ( that in turn merges off 
feature/bug branches). So the 
https://github.com/uBLAS/ublas/tree/feature/ublas00004_simd_gemm is the 
correct branch to request a pull.
Pull requests go to https://github.com/uBLAS/ublas and NOT 
https://github.com/boostorg/ublas). I see the pull request in the 
boostorg repo, so please perform it in the ublas repo. I need to clarify 
this in the wiki because probably it is not very obvious.
-Nasos
On 03/13/2016 02:26 PM, palik imre wrote:
> A bit of confusion here.
>
> I created a fork of the feature branch you sent, as I didn't have the 
> rights to push there.  Then I sent a pull request for that.
>
> Should I fork the master instead?
>
>
> Thanks,
>
> Imre
>
>
> On Sunday, 13 March 2016, 19:03, palik imre <imre_palik_at_[hidden]> 
> wrote:
>
>
> Results for low dimmension.  More data would exceed mailing list limits:
>
> # m original: t1   MFLOPS original: t1    MFLOPS Diff nrm3  gemm:   
> t2    MFLOPS   Diff nrm4 mixed:   t2    MFLOPS   Diff nrm5
>   1   2.1263e-07   9.40601  1.32802e-07 15.06           0 1.36006e-07 
> 14.7052           0 6.31318e-07 3.16798           0
>   2  2.28189e-07   70.1173  1.37767e-07 116.138           0 
> 1.59801e-07 100.125           0   6.653e-07 24.0493           0
>   3    2.649e-07   203.851   1.5541e-07 347.468           0 
> 1.54267e-07 350.042           0 6.98766e-07 77.2791           0
>   4  3.35269e-07   381.783   2.4183e-07 529.297           0 
> 2.12891e-07 601.247           0 6.65688e-07 192.282           0
>   5  3.53868e-07   706.478  2.30977e-07 1082.36           0 
> 2.53215e-07 987.303           0 7.07933e-07 353.141           0
>   6   4.2987e-07   1004.95  2.59713e-07 1663.37           0 
> 2.54867e-07 1695           0 8.17448e-07   528.474 0
>   7  5.39621e-07   1271.26  4.51043e-07 1520.92 7.98975e-09 
> 5.12948e-07   1337.37 7.98975e-09 8.76363e-07   782.781           0
>   8  6.18993e-07    1654.3  6.38988e-07 1602.53 4.17673e-09 
> 6.37931e-07   1605.19 4.17673e-09 8.92556e-07   1147.27           0
>   9  7.73683e-07   1884.49  7.26336e-07 2007.34 3.30697e-09 
> 8.00656e-07   1821.01 3.30697e-09 1.09762e-06   1328.33           0
>  10  9.27569e-07   2156.17  8.31827e-07 2404.35 1.94317e-09 
> 8.72131e-07   2293.23 1.94317e-09 1.16572e-06   1715.68           0
>  11  1.13882e-06    2337.5  1.03275e-06 2577.58 1.27501e-09 
> 1.08775e-06   2447.25 1.27501e-09 1.16439e-06   2286.17           0
>  12  1.26427e-06   2733.59  1.40013e-06 2468.34 8.50076e-10 
> 1.39562e-06   2476.32 8.50076e-10 1.01202e-06   3414.97           0
>  13   1.5751e-06   2789.66  1.64811e-06 2666.09 5.39864e-10 
> 1.66862e-06   2633.32 5.39864e-10 1.61517e-06   2720.45           0
>  14  1.79595e-06   3055.77  1.89937e-06 2889.37 4.08632e-10  
> 1.6485e-06 3329.09           0 1.65016e-06 3325.73           0
>  15  2.14056e-06   3153.37  2.24248e-06 3010.06 2.73316e-10  
> 1.6875e-06 3999.99           0 1.80164e-06 3746.59           0
>  16  2.38996e-06   3427.67  2.63386e-06 3110.27 2.30152e-10 
> 1.74627e-06 4691.14           0 1.91648e-06 4274.49           0
>  17  2.93315e-06   3349.98  3.08031e-06 3189.94 1.85538e-10 
> 2.17697e-06 4513.62           0 2.13505e-06 4602.23           0
>  18   3.3771e-06   3453.85  3.23863e-06 3601.52 1.20251e-10 
> 2.23225e-06 5225.23           0 2.36877e-06 4924.07           0
>  19  4.19699e-06   3268.53  4.02621e-06 3407.17 1.07796e-10 
> 2.29651e-06 5973.4           0 2.44714e-06 5605.72           0
>  20  4.27777e-06   3740.27  4.86115e-06 3291.4 8.37665e-11 2.26798e-06 
> 7054.74           0 2.44016e-06 6556.96           0
>  21  5.58038e-06   3319.13  5.51606e-06 3357.83 5.93714e-11 
> 2.61705e-06 7077.43           0 2.90197e-06 6382.56           0
>  22  5.46208e-06   3898.88  5.50258e-06 3870.19 5.76987e-11 
> 2.85448e-06 7460.56           0 3.09923e-06 6871.39           0
>  23  7.26813e-06   3348.04  6.48407e-06 3752.89 4.47169e-11 
> 3.03986e-06 8004.98           0 3.16566e-06 7686.86           0
>  24  6.56421e-06   4211.93  7.20581e-06 3836.9 3.61275e-11 2.84288e-06 
> 9725.35           0 2.81577e-06 9818.99           0
>  25  7.97135e-06   3920.29  7.80654e-06 4003.06 3.02957e-11 
> 4.04575e-06 7724.16           0 4.15001e-06 7530.11           0
>  26  8.59272e-06    4090.9  8.46934e-06 4150.5 2.53217e-11  4.1795e-06 
> 8410.58           0 4.36958e-06 8044.71           0
>  27  1.05527e-05   3730.41  9.66865e-06 4071.51 1.97479e-11 
> 4.24268e-06 9278.57           0 4.64476e-06 8475.37           0
>  28  9.77679e-06   4490.63   1.0918e-05 4021.26 1.71505e-11 
> 4.41728e-06 9939.14           0 4.55165e-06 9645.73           0
>  29  1.23574e-05   3947.28  1.15308e-05 4230.22 1.54399e-11 
> 4.96383e-06 9826.69           0 5.27042e-06 9255.05           0
>  30  1.25312e-05   4309.24  1.23192e-05 4383.4 1.38837e-11 5.36616e-06 
> 10063.1           0 5.57707e-06 9682.51           0
>  31  1.41019e-05   4225.11  1.41554e-05 4209.15 1.12822e-11 
> 5.56749e-06 10701.8           0 5.87983e-06 10133.3           0
>  32  1.44935e-05   4521.76  1.74419e-05 3757.38  9.5502e-12 
> 5.91291e-06 11083.5           0 6.07622e-06 10785.7           0
>  33  1.68922e-05   4254.86  1.62224e-05 4430.55 8.00562e-12 
> 6.51645e-06 11029.6           0 6.62821e-06 10843.7           0
>  34  1.73001e-05    4543.8  1.68924e-05 4653.46 7.54927e-12 
> 6.83433e-06 11501.9           0 6.95343e-06 11304.9           0
>  35  2.07166e-05    4139.2  2.15962e-05 3970.61 6.52939e-12 
> 7.06462e-06 12137.9           0 7.53811e-06 11375.5           0
>  36  1.98326e-05   4704.97  2.13473e-05 4371.14 5.68874e-12   
> 6.703e-06 13920.9           0 6.99365e-06 13342.4           0
>  37   2.3838e-05   4249.78  2.23655e-05 4529.56 5.11318e-12 
> 8.87253e-06 11417.9           0 9.13862e-06 11085.5           0
>  38  2.35903e-05   4652.09  2.48122e-05 4422.99 4.71306e-12 
> 9.24238e-06 11874           0 9.27922e-06 11826.9           0
>  39  2.79913e-05   4238.39  2.64576e-05 4484.09 4.20714e-12 
> 9.68511e-06 12249.5           0 9.95689e-06 11915.2           0
>  40  2.60131e-05    4920.6   2.9098e-05 4398.93 3.42002e-12 
> 9.80308e-06 13057.1           0 1.04198e-05 12284.3           0
>  41  3.13419e-05   4398.01  3.03942e-05 4535.14 3.13757e-12 
> 1.07587e-05 12812.2           0 1.10016e-05 12529.3           0
>  42  3.10015e-05   4779.64  3.20343e-05 4625.54 2.91245e-12 
> 1.09989e-05 13471.9           0 1.16031e-05 12770.4           0
>  43   3.6527e-05   4353.33  3.49908e-05 4544.46 2.71446e-12 
> 1.13164e-05 14051.7           0 1.20516e-05 13194.4           0
>  44  3.36654e-05   5060.62  3.86435e-05 4408.71 2.49076e-12 
> 1.16151e-05 14667.8           0 1.21377e-05 14036.2           0
>  45  3.95282e-05   4610.63  3.98562e-05 4572.69 2.12037e-12 
> 1.26784e-05 14374.8           0 1.32723e-05 13731.6           0
>  46  3.96351e-05    4911.6  4.17105e-05 4667.22 1.96734e-12 
> 1.27302e-05 15292.2           0 1.34346e-05 14490.3           0
>  47  4.63424e-05   4480.69  4.50811e-05 4606.05 1.77515e-12 
> 1.33133e-05 15596.9           0 1.39354e-05 14900.7           0
>  48  4.31748e-05   5122.99   5.0325e-05 4395.11 1.75073e-12 
> 1.32491e-05 16694.3           0  1.3501e-05 16382.8           0
>  49  4.93001e-05   4772.77  5.11402e-05 4601.03 1.48788e-12  
> 1.6222e-05 14504.9           0 1.72531e-05 13638           0
>
>
> First group is legacy axpy_prod(), second group is legacy prod(), 
> third group is legacy prod for low dimensions, and gemm() for high 
> dimmension. fourth group is gemm().
>
> As the legacy version is expression template based, it can possibly 
> provide some further advantages when the operations are chained.
>
> I put some defines in place, that would make possible to force the 
> legacy version as the default, as opposed to the runtime switched version.
>
> Imre
>
>
> On Friday, 11 March 2016, 14:21, Nasos Iliopoulos 
> <nasos_i_at_[hidden]> wrote:
>
>
> Regardless, these are great figures.
>
> Can you please run them comparing the simple uBlas implementation for 
> matrices from 2 to 100 with the gemm based one with a signle thread? I 
> wonder when the control statement starts to play a role.
>
> What do you think should be the plan to switch from multi-core to to 
> single-threaded so as to not get all the communication hit for smaller 
> matrices?
>
>
> - Nasos
>
>
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden] <mailto:ublas_at_[hidden]>
> http://listarchives.boost.org/mailman/listinfo.cgi/ublas
> Sent to: imre_palik_at_[hidden] <mailto:imre_palik_at_[hidden]>
>
>
>
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://listarchives.boost.org/mailman/listinfo.cgi/ublas
> Sent to: athanasios.iliopoulos.ctr.gr_at_[hidden]