The Better Accuracy of Strassen-Winograd Matrix Multiply I

On March 1 2014 a paper of mine titled as above will be published in Advances in Linear Algebra and Matrix Theory Vol. 4 Number 1. This will be a open publishing venue. I am new to this kind as publishing experience, and I will provide here support, considerations, and  opinions.  I will post a commentary series to help the distribution and to explain the contents.

The title is to poke fun to the subject. In so far, the numerical analysis of these algorithms has been stale in the sense that has not been used to design more accurate algorithms: the analysis has been used to describe the worst case error for  FastMMW but has not been used neither to design better algorithms nor to select the best algorithm for the input at hand.  This point is embarrassingly important.

Let’s have a thought experiment about the sort algorithm, this will be appealing to CS people and Googler’s interviewers. There are about three dozens sorting algorithms with different best, average, and worst case scenarios.  This seems like we have a lot of algorithms to choose from. For example, the merge sort and the quick sort have the same best and average performance and the Quick sort has the worst case performance of O(n^2). That is, every time we choose the pivot element to split the vector into two and sort recursively, we choose always the smallest or the largest for n consecutive times. Albert King would say that he would not know luck at all if it was not for bad luck  or the vector is constant. Would you reject the quick sort a priori for that?

The community did not: the Quick sort comes  with multiple pivots variations, with hierarchical algorithms to improve the overall performance:  we choose the pivot and the number of recursions a finite number of times for the specific input, etc.  For Sorting the attitude is different, the variety and number of algorithms is exciting, the combination of different algorithms to fit a specific purpose is elegant and it makes a great subject for job interviews (usually not very deep after all you need to know three dozens algorithms and you cannot use Google while interviewing at Google).

FastMMW have thousands of variations without considering the even faster ones. I am not kidding during a personal conversation with smarter people, one counted them in the Ks (thousands). The Bounds are  for the family of algorithms (i.e., all sorting)  not for each implementations, the bounds provide only worst case scenarios (forget about best and average), and there is no consideration about the inputs and their nature.

I understand we are talking about precision and not performance, we must be cautious. However, it is true that if we apply Strassen 2 times recursively, the error increases by a bounded constant. It is like to choose to use the quick sort for two level of recursions and then yield to merge sort.

In this work, The Better Accuracy of Strassen-Winograd Matrix Multiply I show that we can design more accurate algorithm even if we use a finite number of recursions, that the theory started 40 years ago is a great starting point if used in combination with new tools and ideas, especially new ideas.

The format of this series is still in my head and thus … but I will start with a new idea.

 

Clearness, contents, contributions, and rejections

I have seen different facets about the presentation of new ideas. This is my collection of thoughts about a few hurdles thrown to any naive contributor of original contents in the scientific field.

Point A: I am reading an old booklet: An introduction to Mathematics  by Alfred North Whitehead (1910), you can find for a penny in Amazon but by a few is considered as the top 50 books of scientific literature. There is a paragraph about mathematical clearness I shall quote here:

“…. In the second place it is necessary for research. It makes for clearness of thought, and thence for boldness of thought and for fertility in trying new combinations of  ideas. When the initial statements are vague and slipshod, at every subsequence state of thought common sense has to step in to limit application and to explain meanings. Now in creating thought common sense is a bad master. Its sole criterion for judgement is that the new ideas shall look like the old ones. In other words it can only act by suppressing originality.”

This is the real reason for clarity and it shows the hard consequences. I like  that mathematical clearness is the foundation of creative thinking and not just a notational game: if I do not have to think all the time of the meaning of terms and ideas, my creative side will be able to tackle new ideas. This requires also some flexibility from the reader: if I write u and explain that is a unit of measure, please let it be. Because if the reader (you) is more familiar with the term b instead, it is just a short hand. If my notation has to be exactly like yours to make sense to you, it sounds more like forcing me to conform to your common sense and thus even less likely to convince you about a result that is truly original and  it is not by you.

Point B: I am a reviewer for a journal and recently I rejected a paper. The authors presented two algorithms A and B, they described them in fairly good details and they chose to present experiments only for one, B. Algorithm  A did not have experiments because it was not appealing, it was not appealing because the implementation was poorly parallel w.r.t. B. One processor was doing 1/4 of the work no matter how many processors were used.  In my review, I presented a different algorithm: that particular 1/4 of the work can be highly parallel, the parallel algorithm is elegant and short to describe,  and there is code for it. My concern was on the contents and what was missing: a better A algorithm. Strangely, if the authors would have given me only B, I would never bothered with it.

Sincerely, I did not fully understand the paper. That was not my point for the rejection. What I did unconsciously and consciously is to focus on the contents and see if B was better than A. Clearness should provide that because was the main goal of the paper. The authors may think that I am an idiot, their paper is just fine, and they are writing a rebuttal to me and the editor to reconsider the publication. If you are a small potatoes, being reviewed sucks. If you are a small potatoes and you are reviewing, you may get carry away.

Point C: At the beginning of 2013, I have been working on a project for a few months. In the second half of the year, I tried to collect my thoughts and results into a paper and submit to publication. The next post will be about the contents of this publication.  IF you follow my posts, you may know that I am an experimental person; that is, observations and experiments are the first part of my work. Nowadays, I often communicate these results during the investigation to  researchers who worked on the same subject and especially if my work may shed a different light on their work. Mostly I want to know if I did something obscenely wrong. Other researches go to a great length avoiding sharing or paying their dues (as B.B. King would say). Some times I even understand the reasons for not sharing but I understand better the reasons for it.

For me mathematical clearness is an attempt to organize and write a process that is often scruffy, based on inductions and sometime intuitions. Sometimes, I start during the experiments to guide me in finding the right questions and the right answers. More often the writing is after words when I have good enough answer. The experiments are driven by curiosity: the desire to know and the disappointment of remaining an idiot (i.e., who ignores). Thought, the writing is hard, I like seeing  the white and empty paper becoming an article. An original contribution describing in the best way I could  what I did and share it.  Without considering the publication process, this process may take months. With the publication process, it may take years.

I submitted this work to an old and good journal where I have already published but I knew the process will be tough. I believed and still I believe it was the right course. During the preparation of the work, I thought I had the blessing of my predecessors (or at least who matter to the topic and to me).   In fact, the paper was rejected. The reviews were mostly aiming at the writing part and how the writing cluttered the understanding of the material.  In practice, clarity is the responsibility of the authors.  A paper cannot be published if it is not clear.  I rewrote the paper, wrote a detailed rebuttal, and submitted somewhere else, it will be published shortly. This paper will prove better bounds to the error of FastMMW: the last such a result was published 40 years ago. Yep, creative thinking in action.

Conclusions: With experience and knowledge, I hope to reach the mathematical clearness that will help my successors and convince my contemporaries.