The Better Accuracy of Fast Matrix Multiply III

In the previous post, I showed that Strassen-Winograd algorithms have a directionality in their errors. This is the reason for their numerical properties and it has been used mainly to express bounds. Today, we use it to improve their numerical properties: after all if you have a constructive proof to prove a bound the least we can do is to use it for the optimization of the algorithm itself.

I admit there are several way to improve the algorithm. Randomization was my first guess. It is simple, it breaks patterns, or we make those pattern not predictable thus not pattern any more. In fact, randomization can be used to spread the error across the result matrix, but I could not make it reducing the intensity of the error.

I came to the idea of using permutations designed tilt the error without actually involve extra computations. These permutations must be tailored to the algorithm but they are not that difficult to grasp or implement. What are these permutations and how they are applied is in the paper. I want to show the effect of those permutation on the maximum error and their locations:

HeatMError86x8610000Times[-1,1]Orthogonal

Take the previous post and look at scatter plots (this and the previous). First of all, I reduced by half the maximum error.  I can do that because I spread the error.

Once again, the geo-locality of the error is the cause of the constant but extra error FastMMW algorithms have. On one side, I find comfort in knowing where the error is. On the other side,  this knowledge can be use to break the error pattern and improve the error we commit. I see a nice connection with yet another one application of Werner Heisenberg’s principle.