My Bio: Paolo D’Alberto

Expertise: Matching, stochastic distance measures, correlation measures, compilers, high performance and embedded computing.

  • Mobile Campaign Measure and Optimizations
    • Graph algorithms for location and user profiling
    • Lift and Matching as campaign measure at scale
      • Spark, MLib, R, Postgree
  • Time series forecasting, Bayesian and generalized linear models
    • For the forecast of inventory and campaign budgeting.
    • Model fitting using and extending R, Python
    • Classification for bid optimizations
    • Correlation of multidimensional time series
  • Native Hadoop applied to Training/Scoring/Evaluation of Maximum entropy models (i.e., Java)
  • Hadoop Streaming (Java, Python, Perl, shell)
  • Fast Matrix Multiplication for multi-core multi-processors, multi GPUs, and APU
    • Multithreaded C/C++ library and OpenCL
  • Change Detection in multidimensional time series
    • Vector Sorting algorithms
    • Kernel methods, Compression, Martingale methods, POSET
  • Stochastic distance measures and correlation
    • Extension and generalization for the anomaly detection in time series
    • For the comparison of search engines
  • Data-mining of web corpora by contents
    • Signature and duplicate detection
  • High performance algorithms for linear algebra
    • Algorithm engineering
    • Application optimizations
  • Compiler analysis tools and compiler optimizations (e.g., static, symbolic and dynamic optimizations) for recursive and non-recursive applications
    • Data locality through code reorganization and data layout

Graphical CV


Industrial Experience

FastMMW (present research)

  • APU+GPU built and OpenCL based implementations of  in memory Matrix Multiplications
  • Performance engineering (HW configuration and software tuning)

NinthDecimal (formerly JiWire) (2014 – present)

  • Geographical Distributions and Data organization for Grid graph algorithms
  • Graph algorithms for location and user profiling
  • Lift and Matching as campaign measure at scale a Valassis Company (2012 – 2014)

  • Time series forecasting for supply inventory
  • Cost, availability, and geographical distribution

33 Across (2011 – 2012)

  • Bridging the research of Max-Entropy like modeling and the engineering production pipeline
  • Machine Learning introduction and Hadoop map-reduce application
  • Performance engineering

Yahoo! Inc.(2007-2011)

  • Correlation measure for the comparison of search engines
  • Data mining of web corpora by signature base and contents
    • Anomaly detection tools for time series 
    • Extension and generalization of information-theoretic measure to one-dimensional distribution function.  

Independently and with Yahoo!

  • Supervising the research of graduate students (statistics + compilers, algorithm engineering)


Research Experience

Carnegie Mellon University, Department of Electrical and Computer Engineering

  • SPIRAL:SW/HW generation for DSP algorithms (Post-Doc Fellow, Jun 2005-2007)

University of California at Irvine, Dep. Computer Science (Ph.D.)

  • JuliusC: Compiler Optimizations for Divide & Conquer Algorithms(2003-2005)
  • ARMR Adaptive Memory Reconfiguration & Management(2000-2002)

Personal Research

  • FASTMM: Fast Matrix Multiplication software package. The implementation of fast matrix multiplication algorithms such as Winograd’s for multicore multiprocessor systems: The fastest MM.

Technical Skills

  • Languages: C/C++, Java JDK, perl, LaTex, Python, Spark,  Pig, Hadoop, Hive, OpenCL


  • Post Doctorate fellowship at CMU ECE (SPIRAL FFT ).
  • Ph.D. in Computer Science, UC at Irvine, 2005
  • Doctorate, Computer Science, University of Bologna, Italy, 2000


  • Books
    1. P.D’Alberto The X-Legion Compiler: A Compiler Approach to Write and to Optimize Divide-And-Conquer Algorithms (Paperback) VDM Verlag (June 26, 2009) amazon link
  • Journals
    1. P.D’Alberto The Better Accuracy of Winograd-Strassen Algorithms (FastMMW) Advances in Linear Algebra and Matrix Theory Abstract | References Full-Text PDF Pub. Date: March 5, 2014
    2. P.D’Alberto, M. Bodrato, and A.Nicolau Exploiting Parallelism in Matrix-Computation Kernels for Symmetric Multiprocessor Systems: Matrix-Multiplication and Matrix-Addition Algorithm Optimizations by Software Pipeline and Threads Allocation ACM Transaction on Mathematical Software 2011 dalbertoBN2011.pdf
    3. P.D’Alberto and A.Nicolau Adaptive Winograd’s Matrix Multiplications ACM Transaction on Mathematical Software 2009 dalberto-nicolau.winograd.TOMS.pdf
    4. P.D’Alberto and A. Nicolau R-Kleene: A High-Performance Divide-and-Conquer Algorithm for the All-Pair Shortest Path for Densely Connected Networks (2005) Algorithmica (2007) paoloN.Algorithmica.pdf
    5. P.D’Alberto, A.Nicolau, A. Veidenbaum, and R.Gupta Line Size Adaptivity Analysis of Parameterized Loop Nests for Direct Mapped Data Cache (2005) Transactions on Computers, IEEE Society, Feb 2005. 118021-2.pdf
  • Proceedings and Posters
    1. M.Badin, P.D’Alberto,  A.Nicolau, M. Dillencourt, and L. Bic Improving Numerical Accuracy for Non-Negative Matrix Multiplication on GPUs using Recursive Algorithms   27th International Conference on Supercomputing 2013  ICS2013_NNMatrix_GPU_mbadin
    2. M.Badin, P.D’Alberto,  A.Nicolau, M. Dillencourt, and L. Bic Improving the Accuracy of High Performance BLAS Implementations using Adaptive Blocked Algorithms 23rd International Symposium on Computer Architectureand High Performance Computing 2011 mark17(2)
    3. R. Cammarota, A. Kejariwal, P. D’Alberto, S. Panigrahi, A. Veidenbaum, and A.Nicolau Pruning Hardaware Evaluation Space via Causality-Driven Application Similarity Analysis ACM International Conference on Computer Frontiers (2011) Patent Awarded cf122-Cammarota
    4. A. Dasdan , P.D’Alberto, S.Kolay, and C. Drome Automatic Retrieval of Similar Content Using Search Engine Query Interface The 18th ACM Conference on Information and Knowledge Management (2009) ind1356-dasdan.pdf
    5. P.D’Alberto and A. Dasdan Non-Parametric Information-Theoretic Measures of One-Dimensional Distribution Functions from Continuous Series The 2009 SIAM International Conference on Data Mining SDM 2009 (accepted — extended version) paoloAli.sdm2009.pdf
    6. F.Franchetti, Y.Voronenko, P.Milder, S.Chellappa, M.Telgarsky, H.Shen, P.D’Alberto, Mesmay, J.Hoe, J.Moura, and M.Puschel Domain-Specific Library Generation for Parallel Software and Hardware Platforms  NSFNGS 2008 NFSNGS2008.pdf
    7. S.Kolay, P.D’Alberto, A.Dasdan, and A.Bhattacharjee A Larger Scale Study of Robots.txt (Poster) WWW 2008 santanuPAA2008.pdf
    8. P.D’Alberto and A.Nicolau Adaptive Strassen’s Matrix Multiplication (2007) The 21th International Conference on Supercomputing paoloA.Strassen.ICS2007.pdf
    9. P.D’Alberto, P.Milder, A.Sandryhaila, F.Franchetti, J.Hoe, J.Johnson, J.Moura, and M.Pueschel Generating FPGA-Accelerated DFT Libraries (2007) The Field-Programmable Custom Computing Machines IEEE Symposium (FCCM) 2007 dalberto-FPGAAccelleratedDFT.pdf
    10. P.D’Alberto, F.Franchetti, and M.Pueschel Performance/Energy Optimization of DSP Transforms on the Scale Processor (2007) The 2007 International Conference on High Performance Embedded Architectures & Compilers paoloMF.hipeac.pdf
    11. P.D’Alberto, P.Milder, F.Franchetti, J.C.Hoe, M.Pueschel and J.Moura Discrete Fourier Transform Compiler for FPGA and CPU/FPGA Partitioned Implementations (2006) The 10th Workshop High performance Embedded Computing paoloPFJMJ.hpec.2006.pdf
    12. P.D’Alberto and A. Nicolau Adaptive Strassen and ATLAS’s DGEMM: A Fast Square-Matrix Multiply for Modern High-Performance Systems (2005) The 8th International Conference on High Performance Computing in Asia Pacific Region (HPC asia) paoloN.hpcasia.strassen.pdf
    13. P.D’Alberto and A. Nicolau Using Recursion to Boost ATLAS’s Performance (2005) The Sixth International Symposium on High Performance Computing (ISHPC-VI) paoloA.ishp-vi.pdf
    14. A.Kejariwal, P.D’Alberto, A.Nicolau, and C.D.Polychronopoulos A Geometric Approach for Partitioning N-Dimensional Non-Rectangular Iteration Space (2004) LCPC 2004: The 17th International Workshop on Languages and Compilers for Parallel Computing lcpc04.KejariwalDNP.pdf
    15. P.D’Alberto and A.Nicolau JuliusC: A Practical Approach for the Analysis of Divide-And-Conquer Algorithms (2004) LCPC 2004: The 17th International Workshop on Languages and Compilers for Parallel Computing
    16. P.D’Alberto, A.Nicolau, and A.Veidenbaum A Data Cache with Dynamic Mapping (2003) LCPC 2003: The 16th International Workshop on Languages and Compilers for Parallel Computing paoloNV.lcpc03.pdf
    17. P.D’Alberto, A.Veidembaum, A.Nicolau, and R.Gupta Static Analysis of Parameterized Loop Nests for Energy Efficient Use of Data Caches (2001) COLP 2001 paoloNVR.pdf
    18. G.Bilardi, P.D’Alberto, and A.Nicolau Fractal Matrix Multiplication: a Case Study on Portability of Cache Performance (2001) WAE 2001 wae.pdf
    19. G.Bilardi, A.Pietracaprina, and P.D’Alberto On the space and access complexity of computation DAGs Workshop on Graph-Theoretic Concepts in Computer Science
  • Talks
    1. P.D’Alberto A Heterogeneous Accelerated Matrix Multiplication: OpenCL + APU + GPU+ Fast Matrix Multiply (2012) AMD Fusion Developer Summit 2012, June 11-14, Bellevue WA arXiv (,OpenCL+APU+GPU+FastMM
    2. Fast Matrix Multiplications for Multicore Multiprocessor Systems 2010 SIAM Conference on Parallel Processing for Scientific Computing (PP10) February 24-26, 2010, Grand Hyatt Seattle, Seattle, Washington pp10-winograd-presentation.pdf
  • Techincal Reports and Theses
    1. P.D’Alberto X- Legion: a compiler-approach to exploit locality and portability of divide-and-conquer algorithms (2005) Ph.D. Thesis paolo-ICS-Thesis-2005.pdf
    2. H.Du, P.D’Alberto, R.Gupta, A.Nicolau, and A.Veidenbaum A quantitative evaluation of adaptive memory hierarchy (2002) Technical Report duDGNV.pdf
    3. P.D’Alberto MIPS R12000 Processor Performance Evaluation by SPEC2000 Benchmarks and Performance Counters MIPS12K.pdf
    4. P.D’Alberto, G.Bilardi, and A.Nicolau Fractal LU-decomposition with partial pivoting technical report FractalLU.pdf
    5. P.D’AlbertoPerformance Evaluation of Data Locality Exploitation (2000) Thesis Dottorato di ricerca in Computer Science Bologna-Padua-Venicealberto00performance.pdf