NumPy vs BLAS: Losing 90% of Throughput
Downloaded over 5 Billion times, NumPy is the most popular library for numerical computing in Python. It wraps low-level HPC libraries like BLAS and LAPACK, providing a high-level interface for matrix operations. BLAS is mainly implemented in C, Fortran, or Assembly and is available for most modern chips, not just CPUs. BLAS is fast, but bindings aren’t generally free. So, how much of the BLAS performance is NumPy leaving on the table?...