104 Commits

Author SHA1 Message Date
Kumar Saurabh Arora
2379b45f82 Few fixes in bench_fw to enable IndexFromCodec (#3383)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3383

In this diff, I am fixing minor issues in bench_fw where either certain fields are not accessible when index is build from codec. It also requires index to be discovered using codec alias as index factory is not always available.

In subsequent diff internal to meta will have testcase that execute this path.

Reviewed By: algoriddle

Differential Revision: D56444641

fbshipit-source-id: b7af7e7bb47b20bbb5515a66f41dd24f42459d52
2024-04-24 09:42:05 -07:00
Andres Suarez
ab2b7f5093 Apply clang-format 18
Summary: Previously this code conformed from clang-format 12.

Reviewed By: igorsugak

Differential Revision: D56065247

fbshipit-source-id: f5a985dd8f8b84f2f9e1818b3719b43c5a1b05b3
2024-04-14 11:28:32 -07:00
Matthijs Douze
fa1f39ec9f Fix HNSW stats (#3309)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3309

Make sure that the HNSW search stats work, remove stats for deprecated functionality.
Remove code of the link and code paper that is not supported anymore.

Reviewed By: kuarora, junjieqi

Differential Revision: D55247802

fbshipit-source-id: 03f176be092bff6b2db359cc956905d8646ea702
2024-03-22 12:55:30 -07:00
Tarang Jain
27b1055cc6 Integrate IVF-PQ from RAFT (#3044)
Summary:
Imports changes from https://github.com/facebookresearch/faiss/issues/3133 and https://github.com/facebookresearch/faiss/issues/3171. So this single PR adds all the changes together.

- [x] Implement RaftIVFPQ class
- [x] Update gtests to test correctness with RAFT enabled
- [x] All googleTests for RAFT enabled IVFPQ pass
- [x] Move some common functions in RaftIVFFlat and RaftIVFPQ to helper: RaftUtils.h
- [x] update Quantizer retroactively after building RAFT index -- both IVFFlat and IVFPQ
- [x] resolve failing LargeBatch (classical GPU)
- [x] add checks for Pascal deprecation
- [x] apply RMM changes from https://github.com/facebookresearch/faiss/issues/3171
- [x] apply robertmaynard's changes from https://github.com/facebookresearch/faiss/issues/3133

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3044

Reviewed By: junjieqi

Differential Revision: D51074065

Pulled By: algoriddle

fbshipit-source-id: 6871257921bcaff2064a20637e2ed358acbdc363
2024-02-21 06:41:08 -08:00
Gergely Szilvasy
1d0e8d489f index optimizer (#3154)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3154

Using the benchmark to find Pareto optimal indices, in this case on BigANN as an example.

Separately optimize the coarse quantizer and the vector codec and use Pareto optimal configurations to construct IVF indices, which are then retested at various scales. See `optimize()` in `optimize.py` as the main function driving the process.

The results can be interpreted with `bench_fw_notebook.ipynb`, which allows:
* filtering by maximum code size
* maximum time
* minimum accuracy
* space or time Pareto optimal options
* and visualize the results and output them as a table.

This version is intentionally limited to IVF(Flat|HNSW),PQ|SQ indices...

Reviewed By: mdouze

Differential Revision: D51781670

fbshipit-source-id: 2c0f800d374ea845255934f519cc28095c00a51f
2024-01-30 10:58:13 -08:00
Matthijs Douze
32f0e8cf92 Generalize ResultHanlder, support range search for HNSW and Fast Scan (#3190)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3190

This diff adds more result handlers in order to expose them externally.
This enables range search for HSNW and Fast Scan, and nprobe parameter support for FastScan.

Reviewed By: pemazare

Differential Revision: D52547384

fbshipit-source-id: 271da5ffea6411df3d8e50641abade18bd7b774b
2024-01-11 11:46:30 -08:00
Gergely Szilvasy
beef6107fc faiss paper benchmarks (#3189)
Summary:
- IVF benchmarks: `bench_fw_ivf.py bench_fw_ivf.py bigann /checkpoint/gsz/bench_fw/ivf`
- Codec benchmarks: `bench_fw_codecs.py contriever /checkpoint/gsz/bench_fw/codecs` and `bench_fw_codecs.py deep1b /checkpoint/gsz/bench_fw/codecs`
- A range codec evaluation: `bench_fw_range.py ssnpp /checkpoint/gsz/bench_fw/range`
- Visualize with `bench_fw_notebook.ipynb`
- Support for running on a cluster

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3189

Reviewed By: mdouze

Differential Revision: D52544642

Pulled By: algoriddle

fbshipit-source-id: 21dcdfd076aef6d36467c908e6be78ef851b0e98
2024-01-05 09:27:04 -08:00
Gergely Szilvasy
4c83965d2b benchmark view results (#3144)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3144

Visualize results of running the benchmark with Pareto optima filtering:
1. per index or across indices
2. for space, time or space & time
3. knn or range search, the latter @ specific precision

Reviewed By: mdouze

Differential Revision: D51552775

fbshipit-source-id: d4f29e3d46ef044e71b54439b3972548c86af5a7
2023-12-04 05:53:17 -08:00
Gergely Szilvasy
9519a19f42 benchmark refactor
Summary:
1. Support for index construction parameters outside of the factory string (arbitrary depth of quantizers).
2. Refactor that provides an index wrapper which is a prereq for the optimizer, which will generate indices from pre-optimized components (particularly quantizers)

Reviewed By: mdouze

Differential Revision: D51427452

fbshipit-source-id: 014d05dd798d856360f2546963e7cad64c2fcaeb
2023-12-04 05:53:17 -08:00
Matthijs Douze
b109d086a2 Search and return codes (#3143)
Summary:
This PR adds a functionality where an IVF index can be searched and the corresponding codes be returned. It also adds a few functions to compress int arrays into a bit-compact representation.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3143

Test Plan:
```
buck test //faiss/tests/:test_index_composite -- TestSearchAndReconstruct

buck test //faiss/tests/:test_standalone_codec -- test_arrays
```

Reviewed By: algoriddle

Differential Revision: D51544613

Pulled By: mdouze

fbshipit-source-id: 875f72d0f9140096851592422570efa0f65431fc
2023-11-25 13:57:25 -08:00
Gergely Szilvasy
c3b9374984 bench_fw - fixes & nits for oss (#3102)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3102

Reviewed By: pemazare

Differential Revision: D50426528

Pulled By: algoriddle

fbshipit-source-id: 886960b8b522318967fc5ec305666871b496cae8
2023-10-20 07:53:56 -07:00
Gergely Szilvasy
0a00d8137a offline index evaluation (#3097)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3097

A framework for evaluating indices offline.

Long term objectives:
1. Generate offline similarity index performance data with test datasets both for existing indices and automatically generated alternatives. That is, given a dataset and some constraints this workflow should automatically discover optimal index types and parameter choices as well as evaluate the performance of existing production indices and their parameters.
2. Allow researchers, platform owners (Laser, Unicorn) and product teams to understand how different index types perform on their datasets and make optimal choices wrt their objectives. Longer term to enable automatic decision-making/auto-tuning.

Constraints, design choices:
1. I want to run the same evaluation on Meta-internal (fblearner, data from hive and manifold) or the local machine + research cluster (data on local disk or NFS) via OSS Faiss. Via fblearner, I want this to work in a way that it can be turned into a service and plugged into Unicorn or Laser, while the core Faiss part can be used/referred to in our research and to update the wiki with the latest results/recommendations for public datasets.
2. It must support a range of metrics for KNN and range search, and it should be easy to add new ones. Cost metrics need to be fine-grained to allow extrapolation.
3. It should automatically sweep all query time params (eg. nprobe, polysemous code hamming distance, params of quantizers), using`OperatingPointsWithRanges` to cut down the optimal param search space. (For now, it sweeps nprobes only.)
4. [FUTURE] It will generate/sweep index creation hyperparams (factory strings, quantizer sizes, quantizer params), using heuristics.
5. [FUTURE] It will sweep the dataset size: start small test with e.g. 100K db vectors and go up to millions, billions potentially, while narrowing down the index+param choices at each step.
6. [FUTURE] Extrapolate perf metrics (cost and accuracy)
7. Intermediate results must be saved (to disk, to manifold) throughout, and reused as much as possible to cut down on overall runtime and enable faster iteration during development.

For range search, this diff supports the metric proposed in https://docs.google.com/document/d/1v5OOj7kfsKJ16xzaEHuKQj12Lrb-HlWLa_T2ct0LJiw/edit?usp=sharing I also added support for the classical case where the scoring function steps from 1 to 0 at some arbitrary threshold.

For KNN, I added knn_intersection, but other metrics, particularly recall@1 will also be interesting. I also added the distance_ratio metric, which we previously discussed as an interesting alternative, since it shows how much the returned results approximate the ground-truth nearest-neighbours in terms of distances.

In the test case, I evaluated three current production indices for VCE with 1M vectors in the database and 10K queries. Each index is tested at various operating points (nprobes), which are shows on the charts. The results are not extrapolated to the true scale of these indices.

Reviewed By: yonglimeta

Differential Revision: D49958434

fbshipit-source-id: f7f567b299118003955dc9e2d9c5b971e0940fc5
2023-10-17 13:56:02 -07:00
chasingegg
6218111233 Fix some typos (#3056)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3056

Reviewed By: pemazare

Differential Revision: D49617607

Pulled By: mlomeli1

fbshipit-source-id: b2d5df67e88e029882e697597af9f3fc8fe1e64c
2023-09-27 03:17:41 -07:00
generatedunixname89002005287564
d85601d972 fairring, faiss, fairness (4401366386162573988)
Reviewed By: r-barnes

Differential Revision: D49181434

fbshipit-source-id: 0554ec62155b422e4abe9cec709b69587f71dea0
2023-09-14 00:50:50 -07:00
Sid Jha
d48e777412 Fix import (#2936)
Summary:
Previous import does not exist.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2936

Reviewed By: mlomeli1

Differential Revision: D47221019

Pulled By: mdouze

fbshipit-source-id: 9ceeba229a10dd4b66da3483cc7695b198e1a8d8
2023-07-05 06:59:05 -07:00
Matthijs Douze
a91a2887fe use dispatcher function to call HammingComputer (#2918)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2918

The HammingComputer class is optimized for several vector sizes. So far it's been the caller's responsiblity to instanciate the relevant optimized version.

This diff introduces a `dispatch_HammingComputer` function that can be called with a template class that is instanciated for all existing optimized HammingComputer's.

Reviewed By: algoriddle

Differential Revision: D46858553

fbshipit-source-id: 32c31689bba7c0b406b309fc8574c95fa24022ba
2023-06-26 14:06:10 -07:00
Matthijs Douze
a27036aa72 add small benchmark for hamming computers
Summary: to measure impact of hamming computer diff

Reviewed By: algoriddle

Differential Revision: D46913890

fbshipit-source-id: 7b9850205885b9b7c5f394f17a79ba222e7b1e2e
2023-06-26 14:06:10 -07:00
Alexandr Guzhva
d407d3fd03 Improve GenHammingDistance for AVX2 (#2815)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2815

Reviewed By: mdouze

Differential Revision: D44817810

fbshipit-source-id: 3d392a6a87ef0192b9ae06fc934fe980596d96a7
2023-04-18 12:56:58 -07:00
Alexandr Guzhva
8d82d24b89 Hamming distance refactoring & ARM version (#2782)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2782

Add a separate branch for ARM Hamming Distance computations. Also, improves a benchmark for hamming computer.

Reviewed By: mdouze

Differential Revision: D44397463

fbshipit-source-id: 1e44e8e7dd1c5b92e95e8afc754170b501d0feed
2023-03-28 13:44:48 -07:00
Matthijs Douze
c78b18cdb2 remove setNumProbes (#2797)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2797

This is the last code instance of setNumProbes
Removing because some people still seem run into errors due to this.

Reviewed By: algoriddle

Differential Revision: D44421600

fbshipit-source-id: fbc1a9d49a0175ddf24c32dab5c1bdb5f1bbbac6
2023-03-28 07:02:53 -07:00
Matthijs Douze
a80c96c0de Evaluation script for hybrid CPU / GPU search
Summary:
Implementation of various combinations of coarse quantization / scaning code on CPU and GPU.

Used to generate the results of

https://github.com/facebookresearch/faiss/wiki/Hybrid-CPU-GPU-search-and-multiple-GPUs

Reviewed By: alexanderguzhva

Differential Revision: D43041802

fbshipit-source-id: 12608812ab351d60d4a6dc45be1ca493f76d4375
2023-02-15 12:55:06 -08:00
Alexandr Guzhva
868e17f294 OSS legal requirements (#2698)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2698

Add headers about copyright.

Reviewed By: algoriddle

Differential Revision: D43085637

fbshipit-source-id: 5a57876b7047097ffe01cd79322674625d9bca34
2023-02-07 14:32:56 -08:00
Matthijs Douze
8fc3775472 building blocks for hybrid CPU / GPU search (#2638)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2638

This diff is a more streamlined way of searching IVF indexes with precomputed clusters.
This will be used for experiments with hybrid CPU / GPU search.

Reviewed By: algoriddle

Differential Revision: D41301032

fbshipit-source-id: a1d645fd0f2bf806454dfd04971edc0a6200d20d
2023-01-12 13:34:44 -08:00
chasingegg
adc9d1a0cd Refactor prepare cache code in cmp_with_scann benchmark (#2573)
Summary:
In ```cmp_with_scann.py```, we will save npy file for base and query vector file and gt file. However, we will only do this while the lib is faiss, if we directly run this script with scann lib it will complain that file does not exsit.
Therefore, the code should be refactored to save npy file from the beginning so that nothing will go wrong.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2573

Reviewed By: mdouze

Differential Revision: D42338435

Pulled By: algoriddle

fbshipit-source-id: 9227f95e1ff79f5329f6206a0cb7ca169185fdb3
2023-01-04 02:35:18 -08:00
zh Wang
60c850e296 Fix hnsw benchmark (#2591)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2591

Reviewed By: mdouze

Differential Revision: D42337854

Pulled By: algoriddle

fbshipit-source-id: 222885fe0e1562deddd0f37c0dbedd1963c885e5
2023-01-04 01:49:09 -08:00
Matthijs Douze
fa53e2c941 Implementation of big-batch IVF search (single machine) (#2567)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2567

Intuitively, it should be easier to handle big-batch searches because all distance computations for a set of queries can be done locally within each inverted list.

This benchmark implements this in pure python (but should be close to optimal in terms of speed), on CPU for IndexIVFFlat, IndexIVFPQ and IndexIVFScalarQuantizer. GPU is also supported.

The results are not systematically better, see https://docs.google.com/document/d/1d3YuV8uN7hut6aOATCOMx8Ut-QEl_oRnJdPgDBRF1QA/edit?usp=sharing

Reviewed By: algoriddle

Differential Revision: D41098338

fbshipit-source-id: 479e471b0d541f242d420f581775d57b708a61b8
2022-12-09 08:53:13 -08:00
Matthijs Douze
9f13e43486 Building blocks for big batch IVF search
Summary:
Adds:
-  a sparse update function to the heaps
- bucket sort functions
- an IndexRandom index to serve as a dummy coarse quantizer for testing

Reviewed By: algoriddle

Differential Revision: D41804055

fbshipit-source-id: 9402b31c37c367aa8554271d8c88bc93cc1e2bda
2022-12-08 09:34:16 -08:00
Matthijs Douze
a996a4a052 Put idx_t in the faiss namespace (#2582)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2582

A few more or less cosmetic improvements
* Index::idx_t was in the Index object, which does not make much sense, this diff moves it to faiss::idx_t
* replace multiprocessing.dummy with multiprocessing.pool
* add Alexandr as a core contributor of Faiss in the README ;-)

```
for i in $( find . -name \*.cu -o -name \*.cuh -o -name \*.h -o -name \*.cpp ) ; do
  sed -i s/Index::idx_t/idx_t/ $i
done
```

For the fbcode deps:
```
for i in $( fbgs Index::idx_t --exclude fbcode/faiss -l ) ; do
   sed -i s/Index::idx_t/idx_t/ $i
done
```

Reviewed By: algoriddle

Differential Revision: D41437507

fbshipit-source-id: 8300f2a3ae97cace6172f3f14a9be3a83999fb89
2022-11-30 08:25:30 -08:00
Alexandr Guzhva
0a622d2d78 Update docs for benchmarks in benchs/ directory (#2565)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2565

Reviewed By: mdouze

Differential Revision: D40856253

fbshipit-source-id: 78f549bb37cdb3e6f562d877f5e33fa1c20834dc
2022-11-08 08:44:42 -08:00
Alexandr Guzhva
771b1a8e37 Introduce transposed centroid table to speedup ProductQuantizer::compute_codes() (#2562)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2562

Introduce a table of transposed centroids in ProductQuantizer that significantly speeds up ProductQuantizer::compute_codes() call for certain PQ parameters, so speeds up search queries.

* ::sync_tranposed_centroids() call is used to fill the table
* ::clear_transposed_centroids() call clear the table, so that the original baseline code is used for ::compute_codes()

Reviewed By: mdouze

Differential Revision: D40763338

fbshipit-source-id: 87b40e5dd2f8c3cadeb94c1cd9e8a4a5b6ffa97d
2022-11-06 08:32:54 -08:00
Alexandr Guzhva
e11a3cf292 Benchmark for SADecodeKernels (#2554)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2554

Reviewed By: mdouze

Differential Revision: D40484008

fbshipit-source-id: c5f9b3c1a42a4ff4ff565d7c3f96af58c967b599
2022-10-31 14:55:08 -07:00
Matthijs Douze
dd814b5f14 IVF filtering based on IDSelector (no init split) (#2483)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2483

This diff changes the following:
1. all search functions now take a `SearchParameters` argument that overrides the internal search parameters
2. the default implementation for most classes throws when the params argument is non-nullptr / non-None
3. the IndexIVF and IndexHNSW classes have functioning SearchPArameters
4. the SearchParameters includes an IDSelector that can search only in a subset of the index based on a defined subset of ids

There is also some refactoring: the IDSelector was moved to its own .h/.cpp and python/__init__.py is spit in parts.

The diff is quite bulky because the search function prototypes need to be changed in all index classes.

Things to fix in subsequent diffs:

- support SearchParameters for more index types (Flat variants)

- better sub-object ownership for SearchParams (with std::unique_ptr?)

- special handling of IDSelectorRange to make it faster

Reviewed By: alexanderguzhva

Differential Revision: D39852589

fbshipit-source-id: 4988bdb5b9bee1207cd327d3f80bf5e0e2467fe1
2022-09-30 06:40:03 -07:00
alemagnani
230a97f7cb Support for parallelization in IVFFastScan over both queries and probes (#2380)
Summary:
For search request with few queries or single query, this PR adds the ability to run threads over both queries and different cluster of the IVF. For application where latency is important this can **dramatically reduce latency for single query requests**.

A new implementation (https://github.com/facebookresearch/faiss/issues/14) is added. The new implementation could be merged to the implementation 12 but for simplicity in this PR, I created a separate function.

Tests are added to cover the new implementation and new tests are added to specifically cover the case when a single query  is used.

In my benchmarks a very good reduction of latency is observed for single query requests.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2380

Test Plan:
```
buck test //faiss/tests/:test_fast_scan_ivf -- implem14
buck test //faiss/tests/:test_fast_scan_ivf -- implem15
```

Reviewed By: alexanderguzhva

Differential Revision: D38074577

Pulled By: mdouze

fbshipit-source-id: e7a20b6ea2f9216e0a045764b5d7b7f550ea89fe
2022-08-31 05:37:53 -07:00
Ryan Russell
d2806286d2 docs: Improve readability (#2378)
Summary:
Signed-off-by: Ryan Russell <git@ryanrussell.org>

Various readability fixes focused on `.md` files:
- Grammar
- Fix some incorrect command references to `distributed_kmeans.py`
- Styling the markdown bash code snippets sections so they format

Attempted to put a lot of little things into one PR and commit; let me know if any mods are needed!

Best,
Ryan

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2378

Reviewed By: alexanderguzhva

Differential Revision: D37717671

Pulled By: mdouze

fbshipit-source-id: 0039192901d98a083cd992e37f6b692d0572103a
2022-07-08 09:19:07 -07:00
Patrick Somaru
578fbc9a8e faiss 6bit benchmark config (#2329)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2329

Reviewed By: beauby

Differential Revision: D36003967

fbshipit-source-id: 1167d028477ab6f42fe8d3cfd2f198c274c0fe9a
2022-05-17 05:19:54 -07:00
Check Deng
9b1982262a Add ProductAdditiveQuantizer (#2286)
Summary:
This diff added ProductAdditiveQuantizer.

A Simple Algo description:

1. Divide the vector space into several orthogonal sub-spaces, just like PQ does.
2. Quantize each sub-space by an independent additive quantizer.

Usage:

Construct a ProductAdditiveQuantizer object:
- `d`: dimensionality of the input vectors
- `nsplits`: number of sub-spaces divided into
- `Msub`: `M` of each additive quantizer
- `nbits`: `nbits` of each additive quantizer

```python
d = 128
nsplits = 2
Msub = 4
nbits = 8
plsq = faiss.ProductLocalSearchQuantizer(d, nsplits, Msub, nbits)
prq = faiss.ProductResidualQuantizer(d, nsplits, Msub, nbits)
```

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2286

Test Plan:
```

buck test //faiss/tests/:test_local_search_quantizer -- TestProductLocalSearchQuantizer

buck test //faiss/tests/:test_residual_quantizer -- TestProductResidualQuantizer

```

Reviewed By: alexanderguzhva

Differential Revision: D35907702

Pulled By: mdouze

fbshipit-source-id: 7428a196e6bd323569caa585c57281dd70e547b1
2022-05-05 15:14:07 -07:00
Lucas Hosseini
7c9d979d66 Enable servicelab regression testing.
Summary:
Start migration of existing benchmarks to Google's Benchmark library + register benchmark to servicelab.

The benchmark should be automatically registered to servicelab once this diff lands according to https://www.internalfb.com/intern/wiki/ServiceLab/Use_Cases/Benchmarks_(C++)/#servicelab-job.

Reviewed By: mdouze

Differential Revision: D35397782

fbshipit-source-id: 317db2527f12ddde0631cacc3085c634afdd0e37
2022-04-07 02:45:55 -07:00
Matthijs Douze
b8fe92dfee contrib clustering module (#2217)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2217

This diff introduces a new Faiss contrib module that contains:
- generic k-means implemented in python (was in distributed_ondisk)
- the two-level clustering code, including a simple function that runs it on a Faiss IVF index.
- sparse clustering code (new)

The main idea is that that code is often re-used so better have it in contrib.

Reviewed By: beauby

Differential Revision: D34170932

fbshipit-source-id: cc297cc56d241b5ef421500ed410d8e2be0f1b77
2022-02-28 14:18:47 -08:00
Check Deng
41007232d6 AQ fastscan (#2169)
Summary:
Work in progress.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2169

Test Plan:
buck test mode/opt //faiss/tests/:test_fast_scan
buck test mode/opt //faiss/tests/:test_fast_scan_ivf

Reviewed By: beauby

Differential Revision: D34208813

Pulled By: mdouze

fbshipit-source-id: 74b72e07dc537667a7def403c4e46d3d05408c27
2022-02-22 15:24:31 -08:00
Chengqi Deng
eba1cb1a90 Support LSQ on GPU (#1978)
Summary:
## Description

This PR added support for LSQ on GPU. Only the encoding part is running on GPU and the others are still running on CPU.

Multi-GPU is also supported.

## Usage

``` python
lsq = faiss.LocalSearchQuantizer(d, M, nbits)
ngpus = faiss.get_num_gpus()
lsq.icm_encoder_factory = faiss.GpuIcmEncoderFactory(ngpus)  # we use all gpus

lsq.train(xt)
codes = lsq.compute_codes(xb)
decoded = lsq.decode(codes)
```

## Performance on SIFT1M

On 1 GPU:
```
===== lsq-gpu:
        mean square error = 17337.878528
        training time: 40.9857234954834 s
        encoding time: 27.12640070915222 s
```

On 2 GPUs:
```
===== lsq-gpu:
        mean square error = 17364.658176
        training time: 25.832106113433838 s
        encoding time: 14.879548072814941 s
```

On CPU:
```
===== lsq:
        mean square error = 17305.880576
        training time: 152.57522344589233 s
        encoding time: 110.01779270172119 s
```

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1978

Test Plan: buck test mode/dev-nosan //faiss/gpu/test/:test_gpu_index_py -- TestLSQIcmEncoder

Reviewed By: wickedfoo

Differential Revision: D29609763

Pulled By: mdouze

fbshipit-source-id: b6ffa2a3c02bf696a4e52348132affa0dd838870
2021-09-09 09:13:15 -07:00
Matthijs Douze
760cce7f3a Support for additive quantizer search (#1961)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1961

This diff implements LUT-based search for additive quantizers.
It also further merges code for LSQ and the RedisualQuantizer.

The documentation + evaluation is on github:

https://github.com/facebookresearch/faiss/wiki/Additive-quantizers

Reviewed By: wickedfoo

Differential Revision: D29395079

fbshipit-source-id: b8a24a647bbdc4cda2a699e791ffdb2a12bfa9c6
2021-08-20 01:00:10 -07:00
Check Deng
48ae55348a Update codebooks with double type (#1975)
Summary:
## Description

The process of updating the codebook in LSQ may be unstable if the data is not zero-centering. This diff fixed it by using `double` instead of `float` during codebook updating. This would not affect the performance since the update process is quite fast.

Users could switch back to `float` mode by setting `update_codebooks_with_double = False`

## Changes

1. Support `double` during codebook updating.
2. Add a unit test.
3. Add `__init__.py` under `contrib/` to avoid warnings.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1975

Reviewed By: wickedfoo

Differential Revision: D29565632

Pulled By: mdouze

fbshipit-source-id: 932d7932ae9725c299cd83f87495542703ad6654
2021-07-07 03:29:49 -07:00
Chengqi Deng
c087f87730 Add LocalSearchQuantizer (#1906)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1906

This PR implemented LSQ/LSQ++, a vector quantization technique described in the following two papers:

1. Revisiting additive quantization
2. LSQ++: Lower running time and higher recall in multi-codebook quantization

Here is a benchmark running on SIFT1M for 64 bits encoding:
```
===== lsq:
        mean square error = 17335.390208
        training time: 312.729779958725 s
        encoding time: 244.6277096271515 s
===== pq:
        mean square error = 23743.004672
        training time: 1.1610801219940186 s
        encoding time: 2.636141061782837 s
===== rq:
        mean square error = 20999.737344
        training time: 31.813055515289307 s
        encoding time: 307.51959800720215 s
```

Changes:

1. Add LocalSearchQuantizer object
2. Fix an out of memory bug in ResidualQuantizer
3. Add a benchmark for evaluating quantizers
4. Add tests for LocalSearchQuantizer

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1862

Test Plan:
```
buck test //faiss/tests/:test_lsq

buck run mode/opt //faiss/benchs/:bench_quantizer -- lsq pq rq
```

Reviewed By: beauby

Differential Revision: D28376369

Pulled By: mdouze

fbshipit-source-id: 2a394d38bf75b9de0a1c2cd6faddf7dd362a6fa8
2021-05-21 01:33:55 -07:00
Chengqi Deng
6f6e90162b Fix typo in bench_index_flat (#1810)
Summary:
This PR fixed the typo in `bench_index_flat.py`.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1810

Reviewed By: beauby

Differential Revision: D27706115

Pulled By: mdouze

fbshipit-source-id: 35515450be8eb45d6a2e98c7372333d98fc0f7b4
2021-04-15 22:58:42 -07:00
Check Deng
b35103a138 Add NSG (#1707)
Summary:
## Description:
This diff implemented Navigating Spreading-out Graph (NSG) which accepts a KNN graph as input.
Here is the interface of building an NSG graph:
``` c++
void IndexNSG::build(idx_t n, const float *x, idx_t *knn_graph, int GK);
```
where `GK` is the nb of neighbors per node and `knn_graph[i * GK + j]` is the j-th neighbor of node i.

The `add` method is not implemented yet.

The unit tests could be found in `tests/test_nsg.cpp`.

mdouze beauby Maybe I need some advice on how to design the interface and support python.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1707

Test Plan: buck test //faiss/tests/:test_index -- TestNSG

Reviewed By: beauby

Differential Revision: D26748498

Pulled By: mdouze

fbshipit-source-id: 3280f705fb1b5f9c8cc5efeba63b904c3b832544
2021-03-10 15:03:00 -08:00
Lucas Hosseini
e86bf8cae1 Enable clang-format + autofix.
Summary: Format whole codebase with clang-format.

Reviewed By: mdouze

Differential Revision: D22891341

fbshipit-source-id: 673032b2444d61026d1e2c3fa2c5659f178cf58b
2021-02-25 04:46:10 -08:00
Lucas Hosseini
6d51766607 Fix unused variables in python
Reviewed By: mdouze

Differential Revision: D26633983

fbshipit-source-id: 32b9f95ed9647716f65b93f2713a8d5bad6abe78
2021-02-24 11:52:18 -08:00
Lucas Hosseini
2a01135127 Add missing copyright headers. (#1689)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1689

Reviewed By: mdouze

Differential Revision: D26460606

Pulled By: beauby

fbshipit-source-id: ad35dd2ea3fb23a0b87bc04597a8fbc38393c997
2021-02-16 09:11:30 -08:00
shengjun.li
cf33102a7e Improve performance of Hamming computer (#1661)
Summary:
Signed-off-by: shengjun.li <shengjun.li@zilliz.com>

Improve performance of Hamming computer

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1661

Reviewed By: wickedfoo

Differential Revision: D26222892

Pulled By: mdouze

fbshipit-source-id: 5c1228b9e6c0f196ebcdfb0227ecdf7a02610871
2021-02-03 10:32:24 -08:00
shengjun.li
908812266c Add heap_replace_top to simplify heap_pop + heap_push (#1597)
Summary:
Signed-off-by: shengjun.li <shengjun.li@zilliz.com>

Add heap_replace_top to simplify heap_pop + heap_push

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1597

Test Plan:
OMP_NUM_THREADS=1 buck run mode/opt //faiss/benchs/:bench_heap_replace
OMP_NUM_THREADS=8 buck run mode/opt //faiss/benchs/:bench_heap_replace

Reviewed By: beauby

Differential Revision: D25943140

Pulled By: mdouze

fbshipit-source-id: 66fe67779dd281a7753f597542c2e797ba0d7df5
2021-01-20 11:28:08 -08:00