faiss/contrib
Matthijs Douze dd814b5f14 IVF filtering based on IDSelector (no init split) (#2483)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2483

This diff changes the following:
1. all search functions now take a `SearchParameters` argument that overrides the internal search parameters
2. the default implementation for most classes throws when the params argument is non-nullptr / non-None
3. the IndexIVF and IndexHNSW classes have functioning SearchPArameters
4. the SearchParameters includes an IDSelector that can search only in a subset of the index based on a defined subset of ids

There is also some refactoring: the IDSelector was moved to its own .h/.cpp and python/__init__.py is spit in parts.

The diff is quite bulky because the search function prototypes need to be changed in all index classes.

Things to fix in subsequent diffs:

- support SearchParameters for more index types (Flat variants)

- better sub-object ownership for SearchParams (with std::unique_ptr?)

- special handling of IDSelectorRange to make it faster

Reviewed By: alexanderguzhva

Differential Revision: D39852589

fbshipit-source-id: 4988bdb5b9bee1207cd327d3f80bf5e0e2467fe1
2022-09-30 06:40:03 -07:00
..
README.md docs: Improve readability (#2378) 2022-07-08 09:19:07 -07:00
__init__.py Update codebooks with double type (#1975) 2021-07-07 03:29:49 -07:00
client_server.py
clustering.py contrib clustering module (#2217) 2022-02-28 14:18:47 -08:00
datasets.py Implement LCC's RCQ + ITQ in Faiss (#2123) 2021-11-25 15:59:18 -08:00
evaluation.py three small fixes (#1972) 2021-07-01 16:08:45 -07:00
exhaustive_search.py Fix exhaustive search GT computation with IP distance (#2212) 2022-02-07 19:36:21 -08:00
factory_tools.py
inspect_tools.py IndexFlatCodes: a single parent for all flat codecs (#2132) 2021-12-07 01:31:07 -08:00
ivf_tools.py Add preassigned functions to contrib 2021-02-25 11:39:07 -08:00
ondisk.py Add assertion to merge_ondisk.py (#2190) 2022-02-03 05:14:22 -08:00
rpc.py Fix RPC lib logging (#2433) 2022-09-08 01:05:52 -07:00
torch_utils.py IVF filtering based on IDSelector (no init split) (#2483) 2022-09-30 06:40:03 -07:00
vecs_io.py

README.md

The contrib modules

The contrib directory contains helper modules for Faiss for various tasks.

Code structure

The contrib directory gets compiled in the module faiss.contrib. Note that although some of the modules may depend on additional modules (eg. GPU Faiss, pytorch, hdf5), they are not necessarily compiled in to avoid adding dependencies. It is the user's responsibility to provide them.

In contrib, we are progressively dropping python2 support.

List of contrib modules

rpc.py

A very simple Remote Procedure Call library, where function parameters and results are pickled, for use with client_server.py

client_server.py

The server handles requests to a Faiss index. The client calls the remote index. This is mainly to shard datasets over several machines, see Distributed index

ondisk.py

Encloses the main logic to merge indexes into an on-disk index. See On-disk storage

exhaustive_search.py

Computes the ground-truth search results for a dataset that possibly does not fit in RAM. Uses GPU if available. Tested in tests/test_contrib.TestComputeGT

torch_utils.py

Interoperability functions for pytorch and Faiss: Importing this will allow pytorch Tensors (CPU or GPU) to be used as arguments to Faiss indexes and other functions. Torch GPU tensors can only be used with Faiss GPU indexes. If this is imported with a package that supports Faiss GPU, the necessary stream synchronization with the current pytorch stream will be automatically performed.

Numpy ndarrays can continue to be used in the Faiss python interface after importing this file. All arguments must be uniformly either numpy ndarrays or Torch tensors; no mixing is allowed.

Tested in tests/test_contrib_torch.py (CPU) and gpu/test/test_contrib_torch_gpu.py (GPU).

inspect_tools.py

Functions to inspect C++ objects wrapped by SWIG. Most often this just means reading fields and converting them to the proper python array.

ivf_tools.py

A few functions to override the coarse quantizer in IVF, providing additional flexibility for assignment.

datasets.py

(may require h5py)

Definition of how to access data for some standard datasets.

factory_tools.py

Functions related to factory strings.

evaluation.py

A few non-trivial evaluation functions for search results

clustering.py

Contains:

  • a Python implementation of kmeans, that can be used for special datatypes (eg. sparse matrices).

  • a 2-level clustering routine and a function that can apply it to train an IndexIVF