An optimized sparse approximate matrix multiply for matrices with decay. We present an optimized single-precision implementation of the sparse approximate matrix multiply (SpAMM) [M. Challacombe and N. Bock, “Fast multiplication of matrices with decay”, arXiv:1011.3534 (2010)], a fast algorithm for matrix-matrix multiplication for matrices with decay that achieves an 𝒪(nlogn) computational complexity with respect to matrix dimension n. We find that the max norm of the error achieved with a SpAMM tolerance below 2×10 -8 is lower than that of the single-precision general matrix-matrix multiply (SGEMM) for dense quantum chemical matrices, while outperforming SGEMM with a cross-over already for small matrices (n∼1000). Relative to naive implementations of SpAMM using Intel’s Math Kernel Library or AMD’s Core Math Library, our optimized version is found to be significantly faster. Detailed performance comparisons are made for quantum chemical matrices with differently structured sub-blocks. Finally, we discuss the potential of improved hardware prefetch to yield 2x to 3x speedups.