In order to use the library you have to compile it from source, we do not provide yet any pre-build binaries.
BLASFEO supports two build system,
make is the suggested one.
You can clone the repository and move inside the project folder with:
git clone https://github.com/giaf/blasfeo.git; cd blasfeo
Some compilation options can be tuned directly modifying the file
Makefile.rule or adding the overridden value in a newly created
Makefile.local which is not tracked by git.
The most important options can be specified with the following flags:
BLASFEO provides different implementation optimized for different computer architectures.
TARGET flag is used in the selection of hand-crafted assembly kernels (for
LA=HIGH_PERFORMANCE), and in the choice of compilation flags (for all
The target architecture has to be specified manually.
If you are unsure about the correct target for you, on Linux you check the CPU model the following command can be used
cat /proc/cpuinfo | grep name.
Given the cpu model, the cpu architecture can be easily discerned e.g. with a browser search.
Furthermore, the command
cat /proc/cpuinfo | grep flags returns a list of the flags (like e.g.
fma) describing the supported ISAs.
The current values for
TARGET=X64_INTEL_HASWELLIntel Haswell architecture , AVX2 and FMA ISA, 64-bit OS.
TARGET=X64_INTEL_SANDY_BRIDGEIntel Sandy-Bridge architecture, AVX ISA, 64-bit OS.
TARGET=X64_INTEL_COREIntel Core architecture, SSE3 ISA, 64-bit OS.
TARGET=X64_AMD_BULLDOZERAMD Bulldozer architecture, AVX and FMA ISAs, 64-bit OS.
TARGET=ARMV8A_ARM_CORTEX_A57ARMv8A architecture VFPv4 and NEONv2 ISAs, 64-bit OS.
TARGET=ARMV7A_ARM_CORTEX_A15, ARMv7A architecture, VFPv3 and NEON ISAs, 32-bit OS.
TARGET=GENERIC, Generic target, coded in C, giving better performance if the architecture provides more than 16 scalar FP registers (e.g. many RISC such as ARM).
The BLASFEO backend provides three possible implementations of each linear algebra routine (
HIGH_PERFORMANCE: target-tailored; performance-optimized for cache resident matrices; panel-major matrix format
REFERENCE: target-unspecific lightly-optimizated; small code footprint; column-major matrix format
BLAS_WRAPPER: call to external BLAS and LAPACK libraries; column-major matrix format
The BLASFEO API is always exported.
Optionally, the flag
BLAS_API gives the possibility to export a BLAS API for selected routines.
The further flag
FORTRAN_BLAS_API controls whether the BLAS API naming is exported in the form
LA=HIGH_PERFORMANCE, the majority of BLASFEO code is assembly.
Code modularity and reuse in assembly are achieved by using assembly subroutines with custom calling convention, which perform elementary operations on register-fitting sub-matrices.
The linear algebra kernels are coded by gluing together the assembly subroutines.
In BLASFEO, assembly subroutines can be optionally be coded as macros, and expanded into the linear algebra kernels. This reduces the overhead of the subroutines calls (noticeable for very small matrices), at the expense of an increase in the library size.
The macro behavior is controlled using the option
MACRO_LEVEL=0: No macro expanded (default)
MACRO_LEVEL=1: All macro expanded but the gemm and gemv kernels which remain subroutines
MACRO_LEVEL=2: All macro expanded
make static_library -j $(nproc)
compiles the sources and creates the static library
libblasfeo.a in the folder lib.
make shared_library -j $(nproc) is the equivalent for the shared library
clears previous builds. It is necessary to do so when changing
make deep_clean additionally removes compiled libraries, generated headers and test/benchmark resuts.
will copy BLASFEO static library and headers in the installation path
Note that the default path,
PREFIX=/opt/blasfeo, requires admin privileges.
make install_shared is the equivalent for shared library.