67 Commits

Author SHA1 Message Date
Matthijs Douze
ab109c28d5 Add search functionality to FlatCodes (#3611)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3611

Using the new dispatcher functions, add search func to flat codes. To test it, make IndexLattice a subclass of FlatCodes and check the resonstruction there.

Reviewed By: asadoughi

Differential Revision: D59367989

fbshipit-source-id: 405dab4358fe34b2e38ac8bcc222b19f58643229
2024-07-11 02:40:38 -07:00
Gergely Szilvasy
3d32330e3d add use_raft to knn_gpu (torch) (#3509)
Summary:
Add support for `use_raft` in the torch version of `knn_gpu`. The numpy version already has this support, see https://github.com/facebookresearch/faiss/blob/main/faiss/python/gpu_wrappers.py#L59

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3509

Reviewed By: mlomeli1, junjieqi

Differential Revision: D58489851

Pulled By: algoriddle

fbshipit-source-id: cfad722fefd4809b135b765d0d43587cfd782d0e
2024-06-12 19:19:23 -07:00
Kumar Saurabh Arora
22304340d2 Adding buck target for experiment bench_fw_ivf (#3423)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3423

Adding small fixes to run experiments from fbcode.
1. Added buck target
2. Full import path of faiss bench_fw modules
3. new dataset path to run tests locally as we can't use  an existing directory ./data in fbcode.

Reviewed By: algoriddle, junjieqi

Differential Revision: D57235092

fbshipit-source-id: f78a23199e619b640a19ca37f8b52ff0abdd8298
2024-05-31 14:30:39 -07:00
Kumar Saurabh Arora
0beecb4c85 sys.big_endian to sys.byteorder (#3422)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3422

Found vec_io failing when running some benchmarking.
There is no such field named big_endian in sys. So, reverting it to original field byteorder

Reviewed By: algoriddle

Differential Revision: D56718607

fbshipit-source-id: 553f1d2d6bc967581142a92282e534f3f164e8f9
2024-05-30 09:27:55 -07:00
Alexandr Guzhva
6a94c67a2f QT_bf16 for scalar quantizer for bfloat16 (#3444)
Summary:
mdouze Please let me know if any additional unit tests are needed

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3444

Reviewed By: algoriddle

Differential Revision: D57665641

Pulled By: mdouze

fbshipit-source-id: 9bec91306a1c31ea4f1f1d726c9d60ac6415fdfc
2024-05-23 02:59:15 -07:00
Matthijs Douze
783e044a2d support big-endian machines (#3361)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3361

Fix a few issues in the PR.
Normally all tests should pass on a litlle-endian machine

Reviewed By: junjieqi

Differential Revision: D56003181

fbshipit-source-id: 405dec8c71898494f5ddcd2718c35708a1abf9cb
2024-04-24 05:40:49 -07:00
Aditya Vidyadhar Kamath
67574aabbc Fix the endianness issue in AIX while running the benchmark. (#3345)
Summary:
This pull request is for issue https://github.com/facebookresearch/faiss/issues/3330. This patch makes sure that packed code arrays are in big endian format. Kindly let us know if we need any changes or if we can have a better approach.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3345

Reviewed By: junjieqi

Differential Revision: D55957630

Pulled By: mdouze

fbshipit-source-id: f728f9563f6b942af9d8899b54662d7ceb811206
2024-04-24 05:40:49 -07:00
Kumar Saurabh Arora
da9f292a4b Support of skip_ids in merge_from_multiple function of OnDiskInvertedLists (#3327)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3327

**Context**
1. [Issue 2621](https://github.com/facebookresearch/faiss/issues/2621) discuss inconsistency between OnDiskInvertedList and InvertedList. OnDiskInvertedList is supposed to handle disk based multiple Index Shards. Thus, we should name it differently when merging invls from index shard.
2. [Issue 2876](https://github.com/facebookresearch/faiss/issues/2876) provides usecase of shifting ids when merging invls from different shards.

**In this diff**,
1. To address #1 above, I renamed the merge_from function to merge_from_multiple without touching merge_from base class.
why so? To continue to allow merge invl from one index to ondiskinvl from other index.

2. To address #2 above, I have added support of shift_ids in merge_from_multiple to shift ids from different shards. This can be used when each shard has same set of ids but different data. This is not recommended if id is already unique across shards.

Reviewed By: mdouze

Differential Revision: D55482518

fbshipit-source-id: 95470c7449160488d2b45b024d134cbc037a2083
2024-04-03 10:36:56 -07:00
Ramil Bakhshyiev
8274c38f27 Remove TypedStorage usage when working with torch_utils (#3301)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3301

In `torch_utils.py`, changed `storage()' references to `untyped_storage()`.

Reviewed By: junjieqi

Differential Revision: D55167842

fbshipit-source-id: 911eda1c22f10595663fb4416ab992903390d457
2024-03-21 10:30:44 -07:00
Maria Lomeli
420d25f51c Index pretransform support in search_preassigned (#3225)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3225

This diff fixes issue [#3113](https://github.com/facebookresearch/faiss/issues/3113), e.g. introduces support for index pretransform in `search_preassigned`.

Reviewed By: mdouze

Differential Revision: D53188584

fbshipit-source-id: 8189c0a59f957a2606391f22cf3fdc8874110a6e
2024-01-30 09:20:07 -08:00
Gergely Szilvasy
c4b91a54d1 Replace pickle serialization to address security vulnerability
Summary: This diff replaces the use of pickle serialization with json to address a security vulnerability. Adding a warning message that this code is for demonstration purposes only.

Reviewed By: mdouze

Differential Revision: D52777650

fbshipit-source-id: d9d6a00fd341b29ac854adcbf675d2cd303d2f29
2024-01-25 07:24:04 -08:00
Matthijs Douze
32f0e8cf92 Generalize ResultHanlder, support range search for HNSW and Fast Scan (#3190)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3190

This diff adds more result handlers in order to expose them externally.
This enables range search for HSNW and Fast Scan, and nprobe parameter support for FastScan.

Reviewed By: pemazare

Differential Revision: D52547384

fbshipit-source-id: 271da5ffea6411df3d8e50641abade18bd7b774b
2024-01-11 11:46:30 -08:00
Gergely Szilvasy
beef6107fc faiss paper benchmarks (#3189)
Summary:
- IVF benchmarks: `bench_fw_ivf.py bench_fw_ivf.py bigann /checkpoint/gsz/bench_fw/ivf`
- Codec benchmarks: `bench_fw_codecs.py contriever /checkpoint/gsz/bench_fw/codecs` and `bench_fw_codecs.py deep1b /checkpoint/gsz/bench_fw/codecs`
- A range codec evaluation: `bench_fw_range.py ssnpp /checkpoint/gsz/bench_fw/range`
- Visualize with `bench_fw_notebook.ipynb`
- Support for running on a cluster

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3189

Reviewed By: mdouze

Differential Revision: D52544642

Pulled By: algoriddle

fbshipit-source-id: 21dcdfd076aef6d36467c908e6be78ef851b0e98
2024-01-05 09:27:04 -08:00
Maria Lomeli
be1242775a Upstream changes to big batch search (#3170)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3170

Logging info, adding to heap and wait_in and out times.

Reviewed By: algoriddle

Differential Revision: D52034667

fbshipit-source-id: 8ab864c5c43d534d094c6e81bb810c74e20c9ac2
2023-12-12 09:51:05 -08:00
Gergely Szilvasy
9519a19f42 benchmark refactor
Summary:
1. Support for index construction parameters outside of the factory string (arbitrary depth of quantizers).
2. Refactor that provides an index wrapper which is a prereq for the optimizer, which will generate indices from pre-optimized components (particularly quantizers)

Reviewed By: mdouze

Differential Revision: D51427452

fbshipit-source-id: 014d05dd798d856360f2546963e7cad64c2fcaeb
2023-12-04 05:53:17 -08:00
Matthijs Douze
b109d086a2 Search and return codes (#3143)
Summary:
This PR adds a functionality where an IVF index can be searched and the corresponding codes be returned. It also adds a few functions to compress int arrays into a bit-compact representation.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3143

Test Plan:
```
buck test //faiss/tests/:test_index_composite -- TestSearchAndReconstruct

buck test //faiss/tests/:test_standalone_codec -- test_arrays
```

Reviewed By: algoriddle

Differential Revision: D51544613

Pulled By: mdouze

fbshipit-source-id: 875f72d0f9140096851592422570efa0f65431fc
2023-11-25 13:57:25 -08:00
Matthijs Douze
9db182460c Relax IVFFlatDedup test (#3077)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3077

This diff relaxes some IVFFlatDedup tests where distances are slighlty different over runs.
Should fix

https://app.circleci.com/pipelines/github/facebookresearch/faiss/4709/workflows/8c8213bf-8fe0-4c4e-9a7d-991f44bf1010/jobs/25551

https://app.circleci.com/pipelines/github/facebookresearch/faiss/4709/workflows/8c8213bf-8fe0-4c4e-9a7d-991f44bf1010/jobs/25547

Reviewed By: algoriddle

Differential Revision: D49732349

fbshipit-source-id: 728b9885c6b7d6ba697ccb6bacc0abd0ee2b0679
2023-09-29 01:16:59 -07:00
Matthijs Douze
687457b2f4 Access graph structure for NSG (#2984)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2984

It is not entirely trivial to access the NSG graph structure from Python (although it is a fixed size N-by-K matrix of vector ids).
This diff adds an inspect_tools function to do that.

Reviewed By: algoriddle

Differential Revision: D48026775

fbshipit-source-id: 94cd7be7f656bcd333d62586531f287ea8e052e5
2023-08-04 06:55:24 -07:00
Matthijs Douze
07fe2b622f Binary cloning and GPU range search (#2916)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2916

Overall better support for binary indexes:
- cloning (to CPU and GPU), only for BinaryFlat for now
- fix bug in reconstruct_n
- range_search_max_results

Reviewed By: algoriddle

Differential Revision: D46755778

fbshipit-source-id: 777ad90aff5c54a77f9685ed6512247a922c6ef5
2023-06-19 06:05:14 -07:00
Gergely Szilvasy
092606b293 bbs producer/consumer threading (#2901)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2901

This diff allows each GPU to work independently, a hot centroid (eg. out-of-distribution queries that hit a centroid heavily) will only block the one GPU that is processing it, others will continue to pick up work independently.

Reviewed By: mdouze

Differential Revision: D46521298

fbshipit-source-id: 171cb06cce8b2d16b7bd744799b105b3cd525be3
2023-06-14 07:58:44 -07:00
Matthijs Douze
6800ebef83 Support independent IVF coarse quantizer
Summary: In the IndexIVFIndepenentQuantizer, the coarse quantizer is applied on the input vectors, but the encoding is performed on a vector-transformed version of the database elements.

Reviewed By: alexanderguzhva

Differential Revision: D45950970

fbshipit-source-id: 30f6cf46d44174b1d99a12384b7d5e2d475c1f88
2023-05-26 02:59:01 -07:00
Matthijs Douze
b9ea339617 support range search from GPU (#2860)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2860

Optimized range search function where the GPU computes by default and falls back on gpu for queries where there are too many results.

Parallelize the CPU to GPU cloning, it seems to work.

Support range_search_preassigned in Python

Fix long-standing issue with SWIG exposed functions that did not release the GIL (in particular the MapLong2Long).

Adds a MapInt64ToInt64 that is more efficient than MapLong2Long.

Reviewed By: algoriddle

Differential Revision: D45672301

fbshipit-source-id: 2e77397c40083818584dbafa5427149359a2abfd
2023-05-16 00:27:53 -07:00
Matthijs Douze
2d8886cd4f IVF sorting routine (#2846)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2846

Adds a function to ivf_contrib to sort the inverted lists by size without changing the results. Also moves big_batch_search to its own module.

Reviewed By: algoriddle

Differential Revision: D45565880

fbshipit-source-id: 091a1c1c074f860d6953bf20d04523292fb55e1a
2023-05-04 09:59:06 -07:00
Matthijs Douze
3704bbe4a7 Add GIST1M to datasets
Summary: GIST1M is on the fair cluster but was not added to the datsets.py

Reviewed By: alexanderguzhva

Differential Revision: D45276664

fbshipit-source-id: 8db41d61b78983f5d01dedca1790618f80f6bc78
2023-04-26 02:07:11 -07:00
Matthijs Douze
016aa04602 make balanced clusters the default (#2796)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2796

This diff makes balanced clusters the default for 2-level clustering. This seems to improve a bit over the default uniform clusters, see

https://github.com/fairinternal/faiss_improvements/blob/master/better_coarse_quantizer/two_level_clustering.ipynb

Warning: the nc2 argument of two_level_clustering becomes the *total* number of clusters.

Reviewed By: algoriddle

Differential Revision: D44421222

fbshipit-source-id: 951b7fc043be4a41762a7e6f7a6fcfb71e303832
2023-03-28 07:23:30 -07:00
Matthijs Douze
581760302f evaluation script + IVF block search (#2781)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2781

This is a benchmarking script for keypoint matching with labelled ground-truth.

Reviewed By: alexanderguzhva

Differential Revision: D44036091

fbshipit-source-id: d9d7c089c4d172b66f33dc968c00713a1b79c2d1
2023-03-24 13:54:08 -07:00
Matthijs Douze
0200d131fc fix windows test (#2775)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2775

Reviewed By: algoriddle

Differential Revision: D44210010

fbshipit-source-id: b9b620a4b0a874e09ee2f6082ff0f9463716fdf4
2023-03-21 05:34:50 -07:00
Matthijs Douze
2d7dd5b0a6 support checkpointing in big batch search
Summary: Big batch search can be running for hours so it's useful to have a checkpointing mechanism in case it's run on a best-effort cluster queue.

Reviewed By: algoriddle

Differential Revision: D44059758

fbshipit-source-id: 5cb5e80800c6d2bf76d9f6cb40736009cd5d4b8e
2023-03-14 11:11:50 -07:00
Iurii Makarov
8ccd800986 Implemented Jaccard distance (#2684)
Summary:
**Summary**
Implemented Jaccard distance requested in this issue: https://github.com/facebookresearch/faiss/issues/1299
**Test plan**
Run: make -C build test
Output: 100% tests passed, 0 tests failed out of 174

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2684

Reviewed By: mdouze

Differential Revision: D43398833

Pulled By: lvoursl

fbshipit-source-id: b38cf27a7858842efe967bcb1033977863716a76
2023-02-27 07:49:42 -08:00
Matthijs Douze
a80c96c0de Evaluation script for hybrid CPU / GPU search
Summary:
Implementation of various combinations of coarse quantization / scaning code on CPU and GPU.

Used to generate the results of

https://github.com/facebookresearch/faiss/wiki/Hybrid-CPU-GPU-search-and-multiple-GPUs

Reviewed By: alexanderguzhva

Differential Revision: D43041802

fbshipit-source-id: 12608812ab351d60d4a6dc45be1ca493f76d4375
2023-02-15 12:55:06 -08:00
Matthijs Douze
8fc3775472 building blocks for hybrid CPU / GPU search (#2638)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2638

This diff is a more streamlined way of searching IVF indexes with precomputed clusters.
This will be used for experiments with hybrid CPU / GPU search.

Reviewed By: algoriddle

Differential Revision: D41301032

fbshipit-source-id: a1d645fd0f2bf806454dfd04971edc0a6200d20d
2023-01-12 13:34:44 -08:00
Jeff Johnson
590f6fb47d Faiss pytorch bridge: revert to TypedStorage (#2631)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2631

The pytorch in fbcode complains about `storage()` saying it is deprecated and we need to move to UntypedStorage `_storage()`, while github CI is using an older version of pytorch where `_storage()` doesn't exist.

As it is only a warning not an error in fbcode, revert to the old form, but we'll likely have to change to `_storage()` eventually.

Reviewed By: alexanderguzhva

Differential Revision: D42107029

fbshipit-source-id: 699c15932e6ae48cd1c60ebb7212dcd9b47626f6
2022-12-16 15:38:08 -08:00
Jeff Johnson
4bb7aa4b77 Faiss + Torch fixes, re-enable k = 2048
Summary:
This diff fixes four separate issues:

- Using the pytorch bridge produces the following deprecation warning. We switch to `_storage()` instead.
```
torch_utils.py:51: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor._storage() instead of tensor.storage()
  x.storage().data_ptr() + x.storage_offset() * 4)
```
- The `storage_offset` for certain types was wrong, but this would only affect torch tensors that were a view into a storage that didn't begin at the beginning.
- The `reconstruct_n` numpy pytorch bridge function allowed passing `-1` for `ni` which indicated that all vectors should be reconstructed. The torch bridge didn't follow this and throw an error:
```
TypeError: torch_replacement_reconstruct_n() missing 2 required positional arguments: 'n0' and 'ni'
```
- Choosing values in the range (1024, 2048] for `k` or `nprobe` were broken in D37777979; this is now fixed again.

Reviewed By: alexanderguzhva

Differential Revision: D42041239

fbshipit-source-id: c7d9b4aba63db8ac73e271c8ef34e231002963d9
2022-12-14 16:21:22 -08:00
Matthijs Douze
fa53e2c941 Implementation of big-batch IVF search (single machine) (#2567)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2567

Intuitively, it should be easier to handle big-batch searches because all distance computations for a set of queries can be done locally within each inverted list.

This benchmark implements this in pure python (but should be close to optimal in terms of speed), on CPU for IndexIVFFlat, IndexIVFPQ and IndexIVFScalarQuantizer. GPU is also supported.

The results are not systematically better, see https://docs.google.com/document/d/1d3YuV8uN7hut6aOATCOMx8Ut-QEl_oRnJdPgDBRF1QA/edit?usp=sharing

Reviewed By: algoriddle

Differential Revision: D41098338

fbshipit-source-id: 479e471b0d541f242d420f581775d57b708a61b8
2022-12-09 08:53:13 -08:00
Matthijs Douze
a996a4a052 Put idx_t in the faiss namespace (#2582)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2582

A few more or less cosmetic improvements
* Index::idx_t was in the Index object, which does not make much sense, this diff moves it to faiss::idx_t
* replace multiprocessing.dummy with multiprocessing.pool
* add Alexandr as a core contributor of Faiss in the README ;-)

```
for i in $( find . -name \*.cu -o -name \*.cuh -o -name \*.h -o -name \*.cpp ) ; do
  sed -i s/Index::idx_t/idx_t/ $i
done
```

For the fbcode deps:
```
for i in $( fbgs Index::idx_t --exclude fbcode/faiss -l ) ; do
   sed -i s/Index::idx_t/idx_t/ $i
done
```

Reviewed By: algoriddle

Differential Revision: D41437507

fbshipit-source-id: 8300f2a3ae97cace6172f3f14a9be3a83999fb89
2022-11-30 08:25:30 -08:00
Jeff Johnson
e3d12c7133 Faiss GPU: add device specifier for bfKnn (#2584)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2584

The `bfKnn` C++ function and `knn_gpu` Python functions for running brute-force k-NN on the GPU did not have a way to specify the GPU device on which the search should run, as it simply used the current thread-local `cudaGetDevice(...)` setting in the CUDA runtime API.

This is unlike the GPU index classes which takes a device argument in the index config struct. Now, both the C++ and Python interface to bfKnn have an optional argument to specify the device.

Default behavior is the current behavior; if the `device` is -1 then the current CUDA thread-local device is used, otherwise we perform the work on the desired device.

Reviewed By: mdouze

Differential Revision: D41448254

fbshipit-source-id: a63c68c12edbe4d725b9fc2a749d5dc935574e12
2022-11-21 18:20:32 -08:00
Matthijs Douze
dd814b5f14 IVF filtering based on IDSelector (no init split) (#2483)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2483

This diff changes the following:
1. all search functions now take a `SearchParameters` argument that overrides the internal search parameters
2. the default implementation for most classes throws when the params argument is non-nullptr / non-None
3. the IndexIVF and IndexHNSW classes have functioning SearchPArameters
4. the SearchParameters includes an IDSelector that can search only in a subset of the index based on a defined subset of ids

There is also some refactoring: the IDSelector was moved to its own .h/.cpp and python/__init__.py is spit in parts.

The diff is quite bulky because the search function prototypes need to be changed in all index classes.

Things to fix in subsequent diffs:

- support SearchParameters for more index types (Flat variants)

- better sub-object ownership for SearchParams (with std::unique_ptr?)

- special handling of IDSelectorRange to make it faster

Reviewed By: alexanderguzhva

Differential Revision: D39852589

fbshipit-source-id: 4988bdb5b9bee1207cd327d3f80bf5e0e2467fe1
2022-09-30 06:40:03 -07:00
François Travais
b74ac352d3 Fix RPC lib logging (#2433)
Summary:
I couldn't run the client-server implementation because of the logging. Indeed `LOG.info('Connected by', addr, end=' ')` raised an exception (`end` is not recognised as a valid argument).

Other warnings are also showing up. This PR clean things up a bit and fixes the client-server.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2433

Reviewed By: alexanderguzhva

Differential Revision: D39167576

Pulled By: mdouze

fbshipit-source-id: 6f74d582f14e353e04029e6465bd6e488a865289
2022-09-08 01:05:52 -07:00
Ryan Russell
d2806286d2 docs: Improve readability (#2378)
Summary:
Signed-off-by: Ryan Russell <git@ryanrussell.org>

Various readability fixes focused on `.md` files:
- Grammar
- Fix some incorrect command references to `distributed_kmeans.py`
- Styling the markdown bash code snippets sections so they format

Attempted to put a lot of little things into one PR and commit; let me know if any mods are needed!

Best,
Ryan

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2378

Reviewed By: alexanderguzhva

Differential Revision: D37717671

Pulled By: mdouze

fbshipit-source-id: 0039192901d98a083cd992e37f6b692d0572103a
2022-07-08 09:19:07 -07:00
Matthijs Douze
b8fe92dfee contrib clustering module (#2217)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2217

This diff introduces a new Faiss contrib module that contains:
- generic k-means implemented in python (was in distributed_ondisk)
- the two-level clustering code, including a simple function that runs it on a Faiss IVF index.
- sparse clustering code (new)

The main idea is that that code is often re-used so better have it in contrib.

Reviewed By: beauby

Differential Revision: D34170932

fbshipit-source-id: cc297cc56d241b5ef421500ed410d8e2be0f1b77
2022-02-28 14:18:47 -08:00
Matthijs Douze
eb8781557f Fix exhaustive search GT computation with IP distance (#2212)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2212

Fixes issue

https://github.com/facebookresearch/faiss/issues/2205

clear bug report
easy fix
easy to accept ;-)

Reviewed By: beauby

Differential Revision: D33975281

fbshipit-source-id: 088e1f3078dc79402563be7fac3530d76b197006
2022-02-07 19:36:21 -08:00
Ben Mann
30abcd6a86 Add assertion to merge_ondisk.py (#2190)
Summary:
Fixes https://github.com/facebookresearch/faiss/issues/2188

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2190

Reviewed By: beauby

Differential Revision: D33975889

Pulled By: mdouze

fbshipit-source-id: 364eeac8de02f3ae00c9676198ed2ce27cfcd12b
2022-02-03 05:14:22 -08:00
Matthijs Douze
c0052c1533 IndexFlatCodes: a single parent for all flat codecs (#2132)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2132

This diff adds the class IndexFlatCodes that becomes the parent of all "flat" encodings.
IndexPQ
IndexFlat
IndexAdditiveQuantizer
IndexScalarQuantizer
IndexLSH
Index2Layer

The other changes are:
- for IndexFlat, there is no vector<float> with the data anymore. It is replaced with a `get_xb()` function. This broke quite a few external codes, that this diff also attempts to fix.
- I/O functions needed to be adapted. This is done without changing the I/O format for any index.
- added a small contrib function to get the data from the IndexFlat
- the functionality has been made uniform, for example remove_ids and add are now in the parent class.

Eventually, we may support generic storage for flat indexes, similar to `InvertedLists`, eg to memmap the data, but this will again require a big change.

Reviewed By: wickedfoo

Differential Revision: D32646769

fbshipit-source-id: 04a1659173fd51b130ae45d345176b72183cae40
2021-12-07 01:31:07 -08:00
Matthijs Douze
bef12cf51b Implement LCC's RCQ + ITQ in Faiss (#2123)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2123

One of the encodings used by LCC is based on a RCQ coarse quantizer and a "payload" of ITQ. The codes are compared with Hamming distances.

The index type `IndexIVFSpectralHash` can be re-purposed to perfrorm this type of index.

This diff contains a small demo demo_rcq_itq script in python to show how:
* the RCQ + ITQ are trained
* the RCQ + ITQ index add and search work (with a very inefficient python implementation)
* they can be transferred to an `IndexIVFSpectralHash`
* the python implementation and `IndexIVFSpectralHash` give the same results

The advantage of using to an `IndexIVFSpectralHash` is that in C++ it offers an `InvertedListScanner` object that can be used to compute query to code distances with its `distance_to_code` method. This is generic and will generalize to  other types of encodings and coarse quantizers.

What is missing is an index_factory to make instanciation easier.

Reviewed By: sc268

Differential Revision: D32642900

fbshipit-source-id: 284f3029d239b7946bbca44a748def4e058489bd
2021-11-25 15:59:18 -08:00
Lucas Hosseini
b4eb51dae8 Change default branch references from master to main. (#2029)
Summary:
This is required for the renaming of the default branch from `master` to `main`, in accordance with the new Facebook OSS guidelines.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2029

Reviewed By: mdouze

Differential Revision: D30672862

Pulled By: beauby

fbshipit-source-id: 0b6458a4ff02a12aae14cf94057e85fdcbcbff96
2021-09-01 09:26:20 -07:00
Matthijs Douze
760cce7f3a Support for additive quantizer search (#1961)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1961

This diff implements LUT-based search for additive quantizers.
It also further merges code for LSQ and the RedisualQuantizer.

The documentation + evaluation is on github:

https://github.com/facebookresearch/faiss/wiki/Additive-quantizers

Reviewed By: wickedfoo

Differential Revision: D29395079

fbshipit-source-id: b8a24a647bbdc4cda2a699e791ffdb2a12bfa9c6
2021-08-20 01:00:10 -07:00
Check Deng
48ae55348a Update codebooks with double type (#1975)
Summary:
## Description

The process of updating the codebook in LSQ may be unstable if the data is not zero-centering. This diff fixed it by using `double` instead of `float` during codebook updating. This would not affect the performance since the update process is quite fast.

Users could switch back to `float` mode by setting `update_codebooks_with_double = False`

## Changes

1. Support `double` during codebook updating.
2. Add a unit test.
3. Add `__init__.py` under `contrib/` to avoid warnings.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1975

Reviewed By: wickedfoo

Differential Revision: D29565632

Pulled By: mdouze

fbshipit-source-id: 932d7932ae9725c299cd83f87495542703ad6654
2021-07-07 03:29:49 -07:00
Matthijs Douze
1829aa92a1 three small fixes (#1972)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1972

This fixes a few issues that I ran into + adds tests:

- range_search_max_results with IP search

- a few missing downcasts for VectorTRansforms

- ResultHeap supports max IP search

Reviewed By: wickedfoo

Differential Revision: D29525093

fbshipit-source-id: d4ff0aff1d83af9717ff1aaa2fe3cda7b53019a3
2021-07-01 16:08:45 -07:00
Chengqi Deng
c087f87730 Add LocalSearchQuantizer (#1906)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1906

This PR implemented LSQ/LSQ++, a vector quantization technique described in the following two papers:

1. Revisiting additive quantization
2. LSQ++: Lower running time and higher recall in multi-codebook quantization

Here is a benchmark running on SIFT1M for 64 bits encoding:
```
===== lsq:
        mean square error = 17335.390208
        training time: 312.729779958725 s
        encoding time: 244.6277096271515 s
===== pq:
        mean square error = 23743.004672
        training time: 1.1610801219940186 s
        encoding time: 2.636141061782837 s
===== rq:
        mean square error = 20999.737344
        training time: 31.813055515289307 s
        encoding time: 307.51959800720215 s
```

Changes:

1. Add LocalSearchQuantizer object
2. Fix an out of memory bug in ResidualQuantizer
3. Add a benchmark for evaluating quantizers
4. Add tests for LocalSearchQuantizer

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1862

Test Plan:
```
buck test //faiss/tests/:test_lsq

buck run mode/opt //faiss/benchs/:bench_quantizer -- lsq pq rq
```

Reviewed By: beauby

Differential Revision: D28376369

Pulled By: mdouze

fbshipit-source-id: 2a394d38bf75b9de0a1c2cd6faddf7dd362a6fa8
2021-05-21 01:33:55 -07:00
Jeff Johnson
b544db24a8 Raw all-pairwise distance function on GPU
Summary:
This diff implements brute-force all-pairwise distances between two different sets of vectors using any of the Faiss supported metrics on the GPU (L2, IP, L1, Lp, Linf, etc).

It is implemented using the same C++ interface as `bfKnn`, except when `k == -1`, all pairwise distances will be returned (no k-selection is made). A restriction exists at present where the entire output data must be able to reside on the same GPU which may be lifted at a subsequent point.

This interface is available in python via `faiss.pairwise_distance_gpu(res, xq, xb, D, metric)` with both numpy and pytorch support which will return all of the distances in D.

Also cleaned up CUDA stream usage a little bit in Distance.cu/Distance.cuh in the C++ implementation.

Reviewed By: mdouze

Differential Revision: D27686773

fbshipit-source-id: 8de6a699cda5d7077f0ab583e9ce76e630f0f687
2021-04-13 12:06:04 -07:00