BLASFEO (as BLAS For Embedded Optimization) provides a set of basic linear algebra routines, performance-optimized for matrices of moderate size (up to a couple hundreds elements in each dimension), as typically encountered in embedded optimization applications.
In the target matrix size range, the optimized version of BLASFEO outperforms both open-source (e.g. OpenBLAS, BLIS, ATLAS) and proprietary (e.g. MKL) BLAS and LAPACK implementations.

haswell dgemm nt haswell dgemm nt
DGEMM_NT and DPOTRF_L routines on Intel Haswell CPU Core i7 4810MQ @3.4GHz (theoretical max throughput of 54.4 GFlops)

The currently supported computer architectures (TARGET) are:

The BLASFEO backend provides three possible implementations of each linear algebra routine (LA):

The BLASFEO API is always exported.
Optionally, the flag BLAS_API gives the possibility to export a BLAS API for selected routines.
The further flag FORTRAN_BLAS_API controls whether the BLAS API naming is exported in the form blasfeo_dgemm or dgemm_.

The currently supported operating systems (OS) are:

BLASFEO employs structures to describe matrices (blasfeo_dmat) and vectors (blasfeo_dvec), defined in include/blasfeo_common.h.
The actual implementation of blasfeo_dmat and blasfeo_dvec depends on the LA and TARGET choice.

More information about BLASFEO can be found in the ArXiv paper, or in the slides of Blis Retreat in 2017 or in the video.

As application examples, BLASFEO is employed in the Model Predictive Control software packages HPIPM, HPMPC and acados.