faiss/INSTALL.md
vorj 4eeaa42b93 Add sve targets (#2886)
Summary:
related: https://github.com/facebookresearch/faiss/issues/2884

This PR contains below changes:

- Add new optlevel `sve`
    - ARM SVE is _extension_ of ARMv8, so it should be treated similar to AVX2 IMO
- Add targets for ARM SVE, `faiss_sve` and `swigfaiss_sve`
    - These targets will be built when you give `-DFAISS_OPT_LEVEL=sve` at build time
    - Design decision: Don't fix SVE register length.
        - The python package of faiss is "fat binary" (for example, the package for avx2 contains `_swigfaiss_avx2.so` and `_swigfaiss.so`)
        - SVE is scalable instruction set (= doesn't fix vector length), but actually we can specify the vector length at compile time.
            - [with `-msve-vector-length=` option](https://developer.arm.com/documentation/101726/4-0/Coding-for-Scalable-Vector-Extension--SVE-/SVE-Vector-Length-Specific--VLS--programming)
            - When this option is specified, the binary can't work correctly on the CPU which has other vector length rather than specified at compile time
        - When we use fixed vector length, SVE-supported faiss python package will contain 7 shared libraries like `_swigfaiss.so` , `_swigfaiss_sve.so` , `_swigfaiss_sve128.so` , `_swigfaiss_sve256.so` , `_swigfaiss_sve512.so` , `_swigfaiss_sve1024.so` , and `_swigfaiss_sve2048.so` . The package size will be exploded.
        - For these reason, I don't specify the vector length at compile time and `faiss_sve` detects the vector length at run time.
- Add a mechanism of detecting ARM SVE on runtime environment and importing `swigfaiss_sve` dynamically
    - Currently it only supports Linux, but there is no SVE environment with non-Linux OS now, as far as I know

NOTE: I plan to make one more PR about add some SVE implementation after this PR merged. This PR only contains adding sve target.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2886

Reviewed By: ramilbakhshyiev

Differential Revision: D60386983

Pulled By: mengdilin

fbshipit-source-id: 7e66162ee53ce88fbfb6636e7bf705b44e6c3282
2024-07-29 15:05:17 -07:00

9.2 KiB
Raw Blame History

Installing Faiss via conda

The supported way to install Faiss is through conda. Stable releases are pushed regularly to the pytorch conda channel, as well as pre-release nightly builds.

  • The CPU-only faiss-cpu conda package is currently available on Linux (x86_64 and arm64), OSX (arm64 only), and Windows (x86_64)
  • faiss-gpu, containing both CPU and GPU indices, is available on Linux (x86_64 only) for CUDA 11.4 and 12.1
  • NEW: faiss-gpu-raft containing both CPU and GPU indices provided by NVIDIA RAFT, is available on Linux (x86_64 only) for CUDA 11.8 and 12.1.

To install the latest stable release:

# CPU-only version
$ conda install -c pytorch faiss-cpu=1.8.0

# GPU(+CPU) version
$ conda install -c pytorch -c nvidia faiss-gpu=1.8.0

# GPU(+CPU) version with NVIDIA RAFT
$ conda install -c pytorch -c nvidia -c rapidsai -c conda-forge faiss-gpu-raft=1.8.0

For faiss-gpu, the nvidia channel is required for CUDA, which is not published in the main anaconda channel.

For faiss-gpu-raft, the nvidia, rapidsai and conda-forge channels are required.

Nightly pre-release packages can be installed as follows:

# CPU-only version
$ conda install -c pytorch/label/nightly faiss-cpu

# GPU(+CPU) version
$ conda install -c pytorch/label/nightly -c nvidia faiss-gpu=1.8.0

# GPU(+CPU) version with NVIDIA RAFT
conda install -c pytorch -c nvidia -c rapidsai -c conda-forge faiss-gpu-raft=1.8.0 pytorch pytorch-cuda numpy

In the above commands, pytorch-cuda=11 or pytorch-cuda=12 would select a specific CUDA version, if its required.

A combination of versions that installs GPU Faiss with CUDA and Pytorch (as of 2024-05-15):

conda create --name faiss_1.8.0
conda activate faiss_1.8.0
conda install -c pytorch -c nvidia faiss-gpu=1.8.0 pytorch=*=*cuda* pytorch-cuda=11 numpy

Installing from conda-forge

Faiss is also being packaged by conda-forge, the community-driven packaging ecosystem for conda. The packaging effort is collaborating with the Faiss team to ensure high-quality package builds.

Due to the comprehensive infrastructure of conda-forge, it may even happen that certain build combinations are supported in conda-forge that are not available through the pytorch channel. To install, use

# CPU version
$ conda install -c conda-forge faiss-cpu

# GPU version
$ conda install -c conda-forge faiss-gpu

You can tell which channel your conda packages come from by using conda list. If you are having problems using a package built by conda-forge, please raise an issue on the conda-forge package "feedstock".

Building from source

Faiss can be built from source using CMake.

Faiss is supported on x86_64 machines on Linux, OSX, and Windows. It has been found to run on other platforms as well, see other platforms.

The basic requirements are:

  • a C++17 compiler (with support for OpenMP support version 2 or higher),
  • a BLAS implementation (on Intel machines we strongly recommend using Intel MKL for best performance).

The optional requirements are:

  • for GPU indices:
    • nvcc,
    • the CUDA toolkit,
  • for the python bindings:
    • python 3,
    • numpy,
    • and swig.

Indications for specific configurations are available in the troubleshooting section of the wiki.

Step 1: invoking CMake

$ cmake -B build .

This generates the system-dependent configuration/build files in the build/ subdirectory.

Several options can be passed to CMake, among which:

  • general options:
    • -DFAISS_ENABLE_GPU=OFF in order to disable building GPU indices (possible values are ON and OFF),
    • -DFAISS_ENABLE_PYTHON=OFF in order to disable building python bindings (possible values are ON and OFF),
    • -DFAISS_ENABLE_RAFT=ON in order to enable building the RAFT implementations of the IVF-Flat and IVF-PQ GPU-accelerated indices (default is OFF, possible values are ON and OFF)
    • -DBUILD_TESTING=OFF in order to disable building C++ tests,
    • -DBUILD_SHARED_LIBS=ON in order to build a shared library (possible values are ON and OFF),
    • -DFAISS_ENABLE_C_API=ON in order to enable building C API (possible values are ON and OFF),
  • optimization-related options:
    • -DCMAKE_BUILD_TYPE=Release in order to enable generic compiler optimization options (enables -O3 on gcc for instance),
    • -DFAISS_OPT_LEVEL=avx2 in order to enable the required compiler flags to generate code using optimized SIMD/Vector instructions. possible values are below:
      • On x86_64, generic, avx2 and avx512, by increasing order of optimization,
      • On aarch64, generic and sve , by increasing order of optimization,
  • BLAS-related options:
    • -DBLA_VENDOR=Intel10_64_dyn -DMKL_LIBRARIES=/path/to/mkl/libs to use the Intel MKL BLAS implementation, which is significantly faster than OpenBLAS (more information about the values for the BLA_VENDOR option can be found in the CMake docs),
  • GPU-related options:
    • -DCUDAToolkit_ROOT=/path/to/cuda-10.1 in order to hint to the path of the CUDA toolkit (for more information, see CMake docs),
    • -DCMAKE_CUDA_ARCHITECTURES="75;72" for specifying which GPU architectures to build against (see CUDA docs to determine which architecture(s) you should pick),
  • python-related options:
    • -DPython_EXECUTABLE=/path/to/python3.7 in order to build a python interface for a different python than the default one (see CMake docs).

Step 2: Invoking Make

$ make -C build -j faiss

This builds the C++ library (libfaiss.a by default, and libfaiss.so if -DBUILD_SHARED_LIBS=ON was passed to CMake).

The -j option enables parallel compilation of multiple units, leading to a faster build, but increasing the chances of running out of memory, in which case it is recommended to set the -j option to a fixed value (such as -j4).

Step 3: Building the python bindings (optional)

$ make -C build -j swigfaiss
$ (cd build/faiss/python && python setup.py install)

The first command builds the python bindings for Faiss, while the second one generates and installs the python package.

Step 4: Installing the C++ library and headers (optional)

$ make -C build install

This will make the compiled library (either libfaiss.a or libfaiss.so on Linux) available system-wide, as well as the C++ headers. This step is not needed to install the python package only.

Step 5: Testing (optional)

Running the C++ test suite

To run the whole test suite, make sure that cmake was invoked with -DBUILD_TESTING=ON, and run:

$ make -C build test

Running the python test suite

$ (cd build/faiss/python && python setup.py build)
$ PYTHONPATH="$(ls -d ./build/faiss/python/build/lib*/)" pytest tests/test_*.py

Basic example

A basic usage example is available in demos/demo_ivfpq_indexing.cpp.

It creates a small index, stores it and performs some searches. A normal runtime is around 20s. With a fast machine and Intel MKL's BLAS it runs in 2.5s.

It can be built with

$ make -C build demo_ivfpq_indexing

and subsequently ran with

$ ./build/demos/demo_ivfpq_indexing

Basic GPU example

$ make -C build demo_ivfpq_indexing_gpu
$ ./build/demos/demo_ivfpq_indexing_gpu

This produce the GPU code equivalent to the CPU demo_ivfpq_indexing. It also shows how to translate indexes from/to a GPU.

A real-life benchmark

A longer example runs and evaluates Faiss on the SIFT1M dataset. To run it, please download the ANN_SIFT1M dataset from http://corpus-texmex.irisa.fr/ and unzip it to the subdirectory sift1M at the root of the source directory for this repository.

Then compile and run the following (after ensuring you have installed faiss):

$ make -C build demo_sift1M
$ ./build/demos/demo_sift1M

This is a demonstration of the high-level auto-tuning API. You can try setting a different index_key to find the indexing structure that gives the best performance.

Real-life test

The following script extends the demo_sift1M test to several types of indexes. This must be run from the root of the source directory for this repository:

$ mkdir tmp  # graphs of the output will be written here
$ python demos/demo_auto_tune.py

It will cycle through a few types of indexes and find optimal operating points. You can play around with the types of indexes.

Real-life test on GPU

The example above also runs on GPU. Edit demos/demo_auto_tune.py at line 100 with the values

keys_to_test = keys_gpu
use_gpu = True

and you can run

$ python demos/demo_auto_tune.py

to test the GPU code.