Toying with R9 295×2 and SGEMM

So I build this red and black little thing

IMG_4199

and I set the R9 295×2 into the second PCI slot. Then I installed clAmdBLAS, OpenCL3 and the catalyst driver for Ubuntu 14. Then I modified the BLAS example so that to use one or both graphic chips. These are the performance plots: SGEMM can achieve 3.4 TFLOPS sustainable peak performance. See below a quick summary for all GEMMs.

 

SGEMMCGEMM

Single Precision above and double precision below.

DGEMMZGEMM

In practice, I show when it is better having two GPUs instead of one for theĀ  R9 252×2 for any GEMM. Performance in GigaFLOPS is measured by accounting the execution time for all data movements from and to memory and the number of operations are normalized to 2N^3. Please, notice that complex matrices matrix multiplications require more operations by a factor of 4.