BLASFEO (as BLAS For Embedded Optimization) provides a set of basic linear algebra routines, performance-optimized for matrices of moderate size (up to a couple hundreds elements in each dimension), as typically encountered in embedded optimization applications.
In the target matrix size range, the optimized version of BLASFEO outperforms both open-source (e.g. OpenBLAS, BLIS, ATLAS) and proprietary (e.g. MKL) BLAS and LAPACK implementations.
The currently supported computer architectures (
X64_INTEL_HASWELLIntel Haswell architecture or newer, AVX2 and FMA ISA, 64-bit OS.
X64_INTEL_SANDY_BRIDGEIntel Sandy-Bridge architecture, AVX ISA, 64-bit OS.
X64_INTEL_COREIntel Core architecture, SSE3 ISA, 64-bit OS.
X64_AMD_BULLDOZERAMD Bulldozer architecture, AVX and FMA ISAs, 64-bit OS.
X86_AMD_JAGUARAMD Jaguar architecture, AVX ISA, 32-bit OS.
X86_AMD_BARCELONAAMD Barcelona architecture, SSE3 ISA, 32-bit OS.
ARMV8A_ARM_CORTEX_A57ARMv8A architecture, VFPv4 and NEONv2 ISAs, 64-bit OS.
ARMV8A_ARM_CORTEX_A53ARMv8A architecture, VFPv4 and NEONv2 ISAs, 64-bit OS.
ARMV7A_ARM_CORTEX_A15ARMv7A architecture, VFPv3 and NEON ISAs, 32-bit OS.
ARMV7A_ARM_CORTEX_A7ARMv7A architecture, VFPv3 and NEON ISAs, 32-bit OS.
GENERICGeneric target, coded in C, giving better performance if the architecture provides more than 16 scalar FP registers (e.g. many RISC such as ARM).
The BLASFEO backend provides three possible implementations of each linear algebra routine (
HIGH_PERFORMANCE: target-tailored; performance-optimized for cache resident matrices; panel-major matrix format
REFERENCE: target-unspecific lightly-optimizated; small code footprint; column-major matrix format
BLAS_WRAPPER: call to external BLAS and LAPACK libraries; column-major matrix format
The BLASFEO API is always exported.
Optionally, the flag
BLAS_API gives the possibility to export a BLAS API for selected routines.
The further flag
FORTRAN_BLAS_API controls whether the BLAS API naming is exported in the form
The currently supported operating systems (
LINUXLinux for x86_64 64-bit, x86 32-bit, ARMv8A 64-bit, ARMv7A 32-bit
WINDOWSWindows for x86_64 64-bit
MACMacOS for x86_64 64-bit
BLASFEO employs structures to describe matrices (blasfeo_dmat) and vectors (blasfeo_dvec), defined in include/blasfeo_common.h.
The actual implementation of blasfeo_dmat and blasfeo_dvec depends on the