faiss/contrib
Chengqi Deng c087f87730 Add LocalSearchQuantizer (#1906)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1906

This PR implemented LSQ/LSQ++, a vector quantization technique described in the following two papers:

1. Revisiting additive quantization
2. LSQ++: Lower running time and higher recall in multi-codebook quantization

Here is a benchmark running on SIFT1M for 64 bits encoding:
```
===== lsq:
        mean square error = 17335.390208
        training time: 312.729779958725 s
        encoding time: 244.6277096271515 s
===== pq:
        mean square error = 23743.004672
        training time: 1.1610801219940186 s
        encoding time: 2.636141061782837 s
===== rq:
        mean square error = 20999.737344
        training time: 31.813055515289307 s
        encoding time: 307.51959800720215 s
```

Changes:

1. Add LocalSearchQuantizer object
2. Fix an out of memory bug in ResidualQuantizer
3. Add a benchmark for evaluating quantizers
4. Add tests for LocalSearchQuantizer

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1862

Test Plan:
```
buck test //faiss/tests/:test_lsq

buck run mode/opt //faiss/benchs/:bench_quantizer -- lsq pq rq
```

Reviewed By: beauby

Differential Revision: D28376369

Pulled By: mdouze

fbshipit-source-id: 2a394d38bf75b9de0a1c2cd6faddf7dd362a6fa8
2021-05-21 01:33:55 -07:00
..
README.md Add preassigned functions to contrib 2021-02-25 11:39:07 -08:00
client_server.py
datasets.py Add LocalSearchQuantizer (#1906) 2021-05-21 01:33:55 -07:00
evaluation.py Add range search accuracy evaluation 2020-12-17 17:17:09 -08:00
exhaustive_search.py Fix unused variables in python 2021-02-24 11:52:18 -08:00
factory_tools.py PQ4 fast scan benchmarks (#1555) 2020-12-16 01:18:58 -08:00
inspect_tools.py Implementation of PQ4 search with SIMD instructions (#1542) 2020-12-03 10:06:38 -08:00
ivf_tools.py Add preassigned functions to contrib 2021-02-25 11:39:07 -08:00
ondisk.py
rpc.py
torch_utils.py Raw all-pairwise distance function on GPU 2021-04-13 12:06:04 -07:00
vecs_io.py PQ4 fast scan benchmarks (#1555) 2020-12-16 01:18:58 -08:00

README.md

The contrib modules

The contrib directory contains helper modules for Faiss for various tasks.

Code structure

The contrib directory gets compiled in the module faiss.contrib. Note that although some of the modules may depend on additional modules (eg. GPU Faiss, pytorch, hdf5), they are not necessarily compiled in to avoid adding dependencies. It is the user's responsibility to provide them.

In contrib, we are progressively dropping python2 support.

List of contrib modules

rpc.py

A very simple Remote Procedure Call library, where function parameters and results are pickled, for use with client_server.py

client_server.py

The server handles requests to a Faiss index. The client calls the remote index. This is mainly to shard datasets over several machines, see Distributd index

ondisk.py

Encloses the main logic to merge indexes into an on-disk index. See On-disk storage

exhaustive_search.py

Computes the ground-truth search results for a dataset that possibly does not fit in RAM. Uses GPU if available. Tested in tests/test_contrib.TestComputeGT

torch_utils.py

Interoperability functions for pytorch and Faiss: Importing this will allow pytorch Tensors (CPU or GPU) to be used as arguments to Faiss indexes and other functions. Torch GPU tensors can only be used with Faiss GPU indexes. If this is imported with a package that supports Faiss GPU, the necessary stream synchronization with the current pytorch stream will be automatically performed.

Numpy ndarrays can continue to be used in the Faiss python interface after importing this file. All arguments must be uniformly either numpy ndarrays or Torch tensors; no mixing is allowed.

Tested in tests/test_contrib_torch.py (CPU) and gpu/test/test_contrib_torch_gpu.py (GPU).

inspect_tools.py

Functions to inspect C++ objects wrapped by SWIG. Most often this just means reading fields and converting them to the proper python array.

ivf_tools.py

A few functions to override the coarse quantizer in IVF, providing additional flexibility for assignment.

datasets.py

(may require h5py)

Defintion of how to access data for some standard datsets.

factory_tools.py

Functions related to factory strings.

evaluation.py

A few non-trivial evaluation functions for search results