Installation instructions

Linux and MacOS user🔗

In order to use the library you have to compile it from source, we do not provide yet any pre-build binaries.

BLASFEO supports two build system, make and CMake. make is the suggested one.

You can clone the repository and move inside the project folder with:

git clone https://github.com/giaf/blasfeo.git; cd blasfeo

Configuration🔗

Some compilation options can be tuned directly modifying the file Makefile.rule or adding the overridden value in a newly created Makefile.local which is not tracked by git.

The most important options can be specified with the following flags:

TARGET🔗

BLASFEO provides different implementation optimized for different computer architectures. The TARGET flag is used in the selection of hand-crafted assembly kernels (for LA=HIGH_PERFORMANCE), and in the choice of compilation flags (for all LA).

The target architecture has to be specified manually. If you are unsure about the correct target for you, on Linux you check the CPU model the following command can be used cat /proc/cpuinfo | grep name. Given the cpu model, the cpu architecture can be easily discerned e.g. with a browser search. Furthermore, the command cat /proc/cpuinfo | grep flags returns a list of the flags (like e.g. ssse3, avx, avx2, fma) describing the supported ISAs.

The current values for TARGET are:

TARGET=X64_INTEL_HASWELL Intel Haswell architecture , AVX2 and FMA ISA, 64-bit OS.
TARGET=X64_INTEL_SANDY_BRIDGE Intel Sandy-Bridge architecture, AVX ISA, 64-bit OS.
TARGET=X64_INTEL_CORE Intel Core architecture, SSE3 ISA, 64-bit OS.
TARGET=X64_AMD_BULLDOZER AMD Bulldozer architecture, AVX and FMA ISAs, 64-bit OS.
TARGET=ARMV8A_ARM_CORTEX_A57 ARMv8A architecture VFPv4 and NEONv2 ISAs, 64-bit OS.
TARGET=ARMV7A_ARM_CORTEX_A15, ARMv7A architecture, VFPv3 and NEON ISAs, 32-bit OS.
TARGET=GENERIC, Generic target, coded in C, giving better performance if the architecture provides more than 16 scalar FP registers (e.g. many RISC such as ARM).

LA backend🔗

The BLASFEO backend provides three possible implementations of each linear algebra routine (LA):

HIGH_PERFORMANCE: target-tailored; performance-optimized for cache resident matrices; panel-major matrix format
REFERENCE: target-unspecific lightly-optimizated; small code footprint; column-major matrix format
BLAS_WRAPPER: call to external BLAS and LAPACK libraries; column-major matrix format

API🔗

The BLASFEO API is always exported.
Optionally, the flag BLAS_API gives the possibility to export a BLAS API for selected routines.
The further flag FORTRAN_BLAS_API controls whether the BLAS API naming is exported in the form blasfeo_dgemm or dgemm_.

MACRO_LEVEL🔗

For LA=HIGH_PERFORMANCE, the majority of BLASFEO code is assembly. Code modularity and reuse in assembly are achieved by using assembly subroutines with custom calling convention, which perform elementary operations on register-fitting sub-matrices. The linear algebra kernels are coded by gluing together the assembly subroutines.

In BLASFEO, assembly subroutines can be optionally be coded as macros, and expanded into the linear algebra kernels. This reduces the overhead of the subroutines calls (noticeable for very small matrices), at the expense of an increase in the library size.

The macro behavior is controlled using the option MACRO_LEVEL:

MACRO_LEVEL=0: No macro expanded (default)
MACRO_LEVEL=1: All macro expanded but the gemm and gemv kernels which remain subroutines
MACRO_LEVEL=2: All macro expanded

Compilation🔗

The command

make static_library -j $(nproc)

compiles the sources and creates the static library libblasfeo.a in the folder lib.
The command make shared_library -j $(nproc) is the equivalent for the shared library libblasfeo.so.

The command

make clean

clears previous builds. It is necessary to do so when changing TARGET or LA.
The command make deep_clean additionally removes compiled libraries, generated headers and test/benchmark resuts.

Installation🔗

The command

make install_static

will copy BLASFEO static library and headers in the installation path PREFIX. Note that the default path, PREFIX=/opt/blasfeo, requires admin privileges.
The command make install_shared is the equivalent for shared library.

< >