BLASFEO

BLASFEO (as BLAS For Embedded Optimization) provides a set of basic linear algebra routines, performance-optimized for matrices of moderate size (up to a couple hundreds elements in each dimension), as typically encountered in embedded optimization applications.
In the target matrix size range, the optimized version of BLASFEO outperforms both open-source (e.g. OpenBLAS, BLIS, ATLAS) and proprietary (e.g. MKL) BLAS and LAPACK implementations.

haswell dgemm nt — DGEMM_NT and DPOTRF_L routines on Intel Haswell CPU Core i7 4810MQ @3.4GHz (theoretical max throughput of 54.4 GFlops)

The currently supported computer architectures (TARGET) are:

X64_INTEL_HASWELL Intel Haswell architecture or newer, AVX2 and FMA ISA, 64-bit OS.
X64_INTEL_SANDY_BRIDGE Intel Sandy-Bridge architecture, AVX ISA, 64-bit OS.
X64_INTEL_CORE Intel Core architecture, SSE3 ISA, 64-bit OS.
X64_AMD_BULLDOZER AMD Bulldozer architecture, AVX and FMA ISAs, 64-bit OS.
X86_AMD_JAGUAR AMD Jaguar architecture, AVX ISA, 32-bit OS.
X86_AMD_BARCELONA AMD Barcelona architecture, SSE3 ISA, 32-bit OS.
ARMV8A_ARM_CORTEX_A57 ARMv8A architecture, VFPv4 and NEONv2 ISAs, 64-bit OS.
ARMV8A_ARM_CORTEX_A53 ARMv8A architecture, VFPv4 and NEONv2 ISAs, 64-bit OS.
ARMV7A_ARM_CORTEX_A15 ARMv7A architecture, VFPv3 and NEON ISAs, 32-bit OS.
ARMV7A_ARM_CORTEX_A7 ARMv7A architecture, VFPv3 and NEON ISAs, 32-bit OS.
GENERIC Generic target, coded in C, giving better performance if the architecture provides more than 16 scalar FP registers (e.g. many RISC such as ARM).

The BLASFEO backend provides three possible implementations of each linear algebra routine (LA):

HIGH_PERFORMANCE: target-tailored; performance-optimized for cache resident matrices; panel-major matrix format
REFERENCE: target-unspecific lightly-optimizated; small code footprint; column-major matrix format
BLAS_WRAPPER: call to external BLAS and LAPACK libraries; column-major matrix format

The BLASFEO API is always exported.
Optionally, the flag BLAS_API gives the possibility to export a BLAS API for selected routines.
The further flag FORTRAN_BLAS_API controls whether the BLAS API naming is exported in the form blasfeo_dgemm or dgemm_.

The currently supported operating systems (OS) are:

LINUX Linux for x86_64 64-bit, x86 32-bit, ARMv8A 64-bit, ARMv7A 32-bit
WINDOWS Windows for x86_64 64-bit
MAC MacOS for x86_64 64-bit

BLASFEO employs structures to describe matrices (blasfeo_dmat) and vectors (blasfeo_dvec), defined in include/blasfeo_common.h.
The actual implementation of blasfeo_dmat and blasfeo_dvec depends on the LA and TARGET choice.

More information about BLASFEO can be found in the ArXiv paper, or in the slides of Blis Retreat in 2017 or in the video.

As application examples, BLASFEO is employed in the Model Predictive Control software packages HPIPM, HPMPC and acados.

Notes:

06-01-2018: BLASFEO employs now a new naming convention. The bash script change_name.sh can be used to automatically change the source code of any software using BLASFEO to adapt it to the new naming convention.