Once upon a time I showed the adaptive Strassen-Winograd algorithm presented in the previous post, Prof. Nicolau (the co-author) asked why this did not come up in the literature before. Why is it new?

Let us breath and let the question sink in …. I always find these questions really fun(ny) because they are * unanswerable*. I will not know that ever, the previous authors know that, probably. Unfortunately, researchers do not publish or better they won’t publish it (like a car will not start without a battery) and, often, they want to forget why something did not work out. I will feel like a moron to publish a paper saying that “

*Well, I liked this problem, but I am an idiot and I could not solve it, better luck to you all*“. It is not good for your career and for your self esteem, it is just not right.

Ironically, here I give an answer even though I may out of my element. If you try to reproduce the previous algorithm without paying attention at the matrix sizes of the temporary results, very likely you will have an incorrect algorithm. I know this because I tried to take the original algorithm and apply it. I failed miserably. Very likely you would fail at first and I guess this is what happened for previous practitioners. Previous researchers had to try, they may had the same problems but went for the path with least resistance. For a solution they understood.

When in CMU, I could chat with one professor who was involved in the implementation of Winograd (not the balanced one). I could finally ask the unanswerable question. What I discovered is kind of illuminating how we think: his main goal was to apply matrix tensors for the description of fast algorithms (Winograd) and he really did not care about a balanced or not balanced algorithm. In CMU I was working on the implementation of the Fast Fourier Transforms (FFT) and *matrix tensors *are powerful tools for the description of FFT implementations —there is a tensor in every SPIRAL paper :).

Why is this illuminating?

I cared for a balanced Winograd algorithm because I wanted to use a set of tools that would fit a balanced computation: optimality of the computation, simplicity, and exploiting data locality in caches. So I went through a path to show that I could use such tools. The previous researcher probably never consider my problem because they had other tools they wanted to use. They may have never tried to have a balanced computation, thus never failed.

## Conclusions:

When somebody ask you an answerable question that you cannot know the answer; for example, your co-author, your wife, your boss or your daughter, you must give an answer, hopefully in the future you will stumble upon the right one.