In order to use the library you have to compile it from source, we do not provide yet any pre-build binaries.
BLASFEO supports two build system, make
and CMake
.
make
is the suggested one.
You can clone the repository and move inside the project folder with:
git clone https://github.com/giaf/blasfeo.git; cd blasfeo
Some compilation options can be tuned directly modifying the file Makefile.rule
or adding the overridden value in a newly created Makefile.local
which is not tracked by git.
The most important options can be specified with the following flags:
BLASFEO provides different implementation optimized for different computer architectures.
The TARGET
flag is used in the selection of hand-crafted assembly kernels (for LA=HIGH_PERFORMANCE
), and in the choice of compilation flags (for all LA
).
The target architecture has to be specified manually.
If you are unsure about the correct target for you, on Linux you check the CPU model the following command can be used cat /proc/cpuinfo | grep name
.
Given the cpu model, the cpu architecture can be easily discerned e.g. with a browser search.
Furthermore, the command cat /proc/cpuinfo | grep flags
returns a list of the flags (like e.g. ssse3
, avx
, avx2
, fma
) describing the supported ISAs.
The current values for TARGET
are:
TARGET=X64_INTEL_HASWELL
Intel Haswell architecture , AVX2 and FMA ISA, 64-bit OS.TARGET=X64_INTEL_SANDY_BRIDGE
Intel Sandy-Bridge architecture, AVX ISA, 64-bit OS.TARGET=X64_INTEL_CORE
Intel Core architecture, SSE3 ISA, 64-bit OS.TARGET=X64_AMD_BULLDOZER
AMD Bulldozer architecture, AVX and FMA ISAs, 64-bit OS.TARGET=ARMV8A_ARM_CORTEX_A57
ARMv8A architecture VFPv4 and NEONv2 ISAs, 64-bit OS.TARGET=ARMV7A_ARM_CORTEX_A15
, ARMv7A architecture, VFPv3 and NEON ISAs, 32-bit OS.TARGET=GENERIC
, Generic target, coded in C, giving better performance if the architecture provides more than 16 scalar FP registers (e.g. many RISC such as ARM).The BLASFEO backend provides three possible implementations of each linear algebra routine (LA
):
HIGH_PERFORMANCE
: target-tailored; performance-optimized for cache resident matrices; panel-major matrix formatREFERENCE
: target-unspecific lightly-optimizated; small code footprint; column-major matrix formatBLAS_WRAPPER
: call to external BLAS and LAPACK libraries; column-major matrix formatThe BLASFEO API is always exported.
Optionally, the flag BLAS_API
gives the possibility to export a BLAS API for selected routines.
The further flag FORTRAN_BLAS_API
controls whether the BLAS API naming is exported in the form blasfeo_dgemm
or dgemm_
.
For LA=HIGH_PERFORMANCE
, the majority of BLASFEO code is assembly.
Code modularity and reuse in assembly are achieved by using assembly subroutines with custom calling convention, which perform elementary operations on register-fitting sub-matrices.
The linear algebra kernels are coded by gluing together the assembly subroutines.
In BLASFEO, assembly subroutines can be optionally be coded as macros, and expanded into the linear algebra kernels. This reduces the overhead of the subroutines calls (noticeable for very small matrices), at the expense of an increase in the library size.
The macro behavior is controlled using the option MACRO_LEVEL
:
MACRO_LEVEL=0
: No macro expanded (default)MACRO_LEVEL=1
: All macro expanded but the gemm and gemv kernels which remain subroutinesMACRO_LEVEL=2
: All macro expandedThe command
make static_library -j $(nproc)
compiles the sources and creates the static library libblasfeo.a
in the folder lib.
The command make shared_library -j $(nproc)
is the equivalent for the shared library libblasfeo.so
.
The command
make clean
clears previous builds. It is necessary to do so when changing TARGET
or LA
.
The command make deep_clean
additionally removes compiled libraries, generated headers and test/benchmark resuts.
The command
make install_static
will copy BLASFEO static library and headers in the installation path PREFIX
.
Note that the default path, PREFIX=/opt/blasfeo
, requires admin privileges.
The command make install_shared
is the equivalent for shared library.