So I build this red and black little thing
and I set the R9 295×2 into the second PCI slot. Then I installed clAmdBLAS, OpenCL3 and the catalyst driver for Ubuntu 14. Then I modified the BLAS example so that to use one or both graphic chips. These are the performance plots: SGEMM can achieve 3.4 TFLOPS sustainable peak performance. See below a quick summary for all GEMMs.
Single Precision above and double precision below.
In practice, I show when it is better having two GPUs instead of one for the R9 252×2 for any GEMM. Performance in GigaFLOPS is measured by accounting the execution time for all data movements from and to memory and the number of operations are normalized to 2N^3. Please, notice that complex matrices matrix multiplications require more operations by a factor of 4.