faiss

mirror of https://github.com/facebookresearch/faiss.git synced 2025-06-03 21:54:02 +08:00

Author	SHA1	Message	Date
Matthijs Douze	ab109c28d5	Add search functionality to FlatCodes (#3611 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3611 Using the new dispatcher functions, add search func to flat codes. To test it, make IndexLattice a subclass of FlatCodes and check the resonstruction there. Reviewed By: asadoughi Differential Revision: D59367989 fbshipit-source-id: 405dab4358fe34b2e38ac8bcc222b19f58643229	2024-07-11 02:40:38 -07:00
Gergely Szilvasy	3d32330e3d	add use_raft to knn_gpu (torch) (#3509 ) Summary: Add support for `use_raft` in the torch version of `knn_gpu`. The numpy version already has this support, see https://github.com/facebookresearch/faiss/blob/main/faiss/python/gpu_wrappers.py#L59 Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3509 Reviewed By: mlomeli1, junjieqi Differential Revision: D58489851 Pulled By: algoriddle fbshipit-source-id: cfad722fefd4809b135b765d0d43587cfd782d0e	2024-06-12 19:19:23 -07:00
Kumar Saurabh Arora	22304340d2	Adding buck target for experiment bench_fw_ivf (#3423 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3423 Adding small fixes to run experiments from fbcode. 1. Added buck target 2. Full import path of faiss bench_fw modules 3. new dataset path to run tests locally as we can't use an existing directory ./data in fbcode. Reviewed By: algoriddle, junjieqi Differential Revision: D57235092 fbshipit-source-id: f78a23199e619b640a19ca37f8b52ff0abdd8298	2024-05-31 14:30:39 -07:00
Kumar Saurabh Arora	0beecb4c85	sys.big_endian to sys.byteorder (#3422 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3422 Found vec_io failing when running some benchmarking. There is no such field named big_endian in sys. So, reverting it to original field byteorder Reviewed By: algoriddle Differential Revision: D56718607 fbshipit-source-id: 553f1d2d6bc967581142a92282e534f3f164e8f9	2024-05-30 09:27:55 -07:00
Alexandr Guzhva	6a94c67a2f	QT_bf16 for scalar quantizer for bfloat16 (#3444 ) Summary: mdouze Please let me know if any additional unit tests are needed Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3444 Reviewed By: algoriddle Differential Revision: D57665641 Pulled By: mdouze fbshipit-source-id: 9bec91306a1c31ea4f1f1d726c9d60ac6415fdfc	2024-05-23 02:59:15 -07:00
Matthijs Douze	783e044a2d	support big-endian machines (#3361 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3361 Fix a few issues in the PR. Normally all tests should pass on a litlle-endian machine Reviewed By: junjieqi Differential Revision: D56003181 fbshipit-source-id: 405dec8c71898494f5ddcd2718c35708a1abf9cb	2024-04-24 05:40:49 -07:00
Aditya Vidyadhar Kamath	67574aabbc	Fix the endianness issue in AIX while running the benchmark. (#3345 ) Summary: This pull request is for issue https://github.com/facebookresearch/faiss/issues/3330. This patch makes sure that packed code arrays are in big endian format. Kindly let us know if we need any changes or if we can have a better approach. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3345 Reviewed By: junjieqi Differential Revision: D55957630 Pulled By: mdouze fbshipit-source-id: f728f9563f6b942af9d8899b54662d7ceb811206	2024-04-24 05:40:49 -07:00
Kumar Saurabh Arora	da9f292a4b	Support of skip_ids in merge_from_multiple function of OnDiskInvertedLists (#3327 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3327 Context 1. [Issue 2621](https://github.com/facebookresearch/faiss/issues/2621) discuss inconsistency between OnDiskInvertedList and InvertedList. OnDiskInvertedList is supposed to handle disk based multiple Index Shards. Thus, we should name it differently when merging invls from index shard. 2. [Issue 2876](https://github.com/facebookresearch/faiss/issues/2876) provides usecase of shifting ids when merging invls from different shards. In this diff, 1. To address #1 above, I renamed the merge_from function to merge_from_multiple without touching merge_from base class. why so? To continue to allow merge invl from one index to ondiskinvl from other index. 2. To address #2 above, I have added support of shift_ids in merge_from_multiple to shift ids from different shards. This can be used when each shard has same set of ids but different data. This is not recommended if id is already unique across shards. Reviewed By: mdouze Differential Revision: D55482518 fbshipit-source-id: 95470c7449160488d2b45b024d134cbc037a2083	2024-04-03 10:36:56 -07:00
Ramil Bakhshyiev	8274c38f27	Remove TypedStorage usage when working with torch_utils (#3301 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3301 In `torch_utils.py`, changed `storage()' references to `untyped_storage()`. Reviewed By: junjieqi Differential Revision: D55167842 fbshipit-source-id: 911eda1c22f10595663fb4416ab992903390d457	2024-03-21 10:30:44 -07:00
Maria Lomeli	420d25f51c	Index pretransform support in search_preassigned (#3225 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3225 This diff fixes issue [#3113](https://github.com/facebookresearch/faiss/issues/3113), e.g. introduces support for index pretransform in `search_preassigned`. Reviewed By: mdouze Differential Revision: D53188584 fbshipit-source-id: 8189c0a59f957a2606391f22cf3fdc8874110a6e	2024-01-30 09:20:07 -08:00
Gergely Szilvasy	c4b91a54d1	Replace pickle serialization to address security vulnerability Summary: This diff replaces the use of pickle serialization with json to address a security vulnerability. Adding a warning message that this code is for demonstration purposes only. Reviewed By: mdouze Differential Revision: D52777650 fbshipit-source-id: d9d6a00fd341b29ac854adcbf675d2cd303d2f29	2024-01-25 07:24:04 -08:00
Matthijs Douze	32f0e8cf92	Generalize ResultHanlder, support range search for HNSW and Fast Scan (#3190 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3190 This diff adds more result handlers in order to expose them externally. This enables range search for HSNW and Fast Scan, and nprobe parameter support for FastScan. Reviewed By: pemazare Differential Revision: D52547384 fbshipit-source-id: 271da5ffea6411df3d8e50641abade18bd7b774b	2024-01-11 11:46:30 -08:00
Gergely Szilvasy	beef6107fc	faiss paper benchmarks (#3189 ) Summary: - IVF benchmarks: `bench_fw_ivf.py bench_fw_ivf.py bigann /checkpoint/gsz/bench_fw/ivf` - Codec benchmarks: `bench_fw_codecs.py contriever /checkpoint/gsz/bench_fw/codecs` and `bench_fw_codecs.py deep1b /checkpoint/gsz/bench_fw/codecs` - A range codec evaluation: `bench_fw_range.py ssnpp /checkpoint/gsz/bench_fw/range` - Visualize with `bench_fw_notebook.ipynb` - Support for running on a cluster Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3189 Reviewed By: mdouze Differential Revision: D52544642 Pulled By: algoriddle fbshipit-source-id: 21dcdfd076aef6d36467c908e6be78ef851b0e98	2024-01-05 09:27:04 -08:00
Maria Lomeli	be1242775a	Upstream changes to big batch search (#3170 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3170 Logging info, adding to heap and wait_in and out times. Reviewed By: algoriddle Differential Revision: D52034667 fbshipit-source-id: 8ab864c5c43d534d094c6e81bb810c74e20c9ac2	2023-12-12 09:51:05 -08:00
Gergely Szilvasy	9519a19f42	benchmark refactor Summary: 1. Support for index construction parameters outside of the factory string (arbitrary depth of quantizers). 2. Refactor that provides an index wrapper which is a prereq for the optimizer, which will generate indices from pre-optimized components (particularly quantizers) Reviewed By: mdouze Differential Revision: D51427452 fbshipit-source-id: 014d05dd798d856360f2546963e7cad64c2fcaeb	2023-12-04 05:53:17 -08:00
Matthijs Douze	b109d086a2	Search and return codes (#3143 ) Summary: This PR adds a functionality where an IVF index can be searched and the corresponding codes be returned. It also adds a few functions to compress int arrays into a bit-compact representation. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3143 Test Plan: ``` buck test //faiss/tests/:test_index_composite -- TestSearchAndReconstruct buck test //faiss/tests/:test_standalone_codec -- test_arrays ``` Reviewed By: algoriddle Differential Revision: D51544613 Pulled By: mdouze fbshipit-source-id: 875f72d0f9140096851592422570efa0f65431fc	2023-11-25 13:57:25 -08:00
Matthijs Douze	9db182460c	Relax IVFFlatDedup test (#3077 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3077 This diff relaxes some IVFFlatDedup tests where distances are slighlty different over runs. Should fix https://app.circleci.com/pipelines/github/facebookresearch/faiss/4709/workflows/8c8213bf-8fe0-4c4e-9a7d-991f44bf1010/jobs/25551 https://app.circleci.com/pipelines/github/facebookresearch/faiss/4709/workflows/8c8213bf-8fe0-4c4e-9a7d-991f44bf1010/jobs/25547 Reviewed By: algoriddle Differential Revision: D49732349 fbshipit-source-id: 728b9885c6b7d6ba697ccb6bacc0abd0ee2b0679	2023-09-29 01:16:59 -07:00
Matthijs Douze	687457b2f4	Access graph structure for NSG (#2984 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2984 It is not entirely trivial to access the NSG graph structure from Python (although it is a fixed size N-by-K matrix of vector ids). This diff adds an inspect_tools function to do that. Reviewed By: algoriddle Differential Revision: D48026775 fbshipit-source-id: 94cd7be7f656bcd333d62586531f287ea8e052e5	2023-08-04 06:55:24 -07:00
Matthijs Douze	07fe2b622f	Binary cloning and GPU range search (#2916 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2916 Overall better support for binary indexes: - cloning (to CPU and GPU), only for BinaryFlat for now - fix bug in reconstruct_n - range_search_max_results Reviewed By: algoriddle Differential Revision: D46755778 fbshipit-source-id: 777ad90aff5c54a77f9685ed6512247a922c6ef5	2023-06-19 06:05:14 -07:00
Gergely Szilvasy	092606b293	bbs producer/consumer threading (#2901 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2901 This diff allows each GPU to work independently, a hot centroid (eg. out-of-distribution queries that hit a centroid heavily) will only block the one GPU that is processing it, others will continue to pick up work independently. Reviewed By: mdouze Differential Revision: D46521298 fbshipit-source-id: 171cb06cce8b2d16b7bd744799b105b3cd525be3	2023-06-14 07:58:44 -07:00
Matthijs Douze	6800ebef83	Support independent IVF coarse quantizer Summary: In the IndexIVFIndepenentQuantizer, the coarse quantizer is applied on the input vectors, but the encoding is performed on a vector-transformed version of the database elements. Reviewed By: alexanderguzhva Differential Revision: D45950970 fbshipit-source-id: 30f6cf46d44174b1d99a12384b7d5e2d475c1f88	2023-05-26 02:59:01 -07:00
Matthijs Douze	b9ea339617	support range search from GPU (#2860 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2860 Optimized range search function where the GPU computes by default and falls back on gpu for queries where there are too many results. Parallelize the CPU to GPU cloning, it seems to work. Support range_search_preassigned in Python Fix long-standing issue with SWIG exposed functions that did not release the GIL (in particular the MapLong2Long). Adds a MapInt64ToInt64 that is more efficient than MapLong2Long. Reviewed By: algoriddle Differential Revision: D45672301 fbshipit-source-id: 2e77397c40083818584dbafa5427149359a2abfd	2023-05-16 00:27:53 -07:00
Matthijs Douze	2d8886cd4f	IVF sorting routine (#2846 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2846 Adds a function to ivf_contrib to sort the inverted lists by size without changing the results. Also moves big_batch_search to its own module. Reviewed By: algoriddle Differential Revision: D45565880 fbshipit-source-id: 091a1c1c074f860d6953bf20d04523292fb55e1a	2023-05-04 09:59:06 -07:00
Matthijs Douze	3704bbe4a7	Add GIST1M to datasets Summary: GIST1M is on the fair cluster but was not added to the datsets.py Reviewed By: alexanderguzhva Differential Revision: D45276664 fbshipit-source-id: 8db41d61b78983f5d01dedca1790618f80f6bc78	2023-04-26 02:07:11 -07:00
Matthijs Douze	016aa04602	make balanced clusters the default (#2796 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2796 This diff makes balanced clusters the default for 2-level clustering. This seems to improve a bit over the default uniform clusters, see https://github.com/fairinternal/faiss_improvements/blob/master/better_coarse_quantizer/two_level_clustering.ipynb Warning: the nc2 argument of two_level_clustering becomes the total number of clusters. Reviewed By: algoriddle Differential Revision: D44421222 fbshipit-source-id: 951b7fc043be4a41762a7e6f7a6fcfb71e303832	2023-03-28 07:23:30 -07:00
Matthijs Douze	581760302f	evaluation script + IVF block search (#2781 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2781 This is a benchmarking script for keypoint matching with labelled ground-truth. Reviewed By: alexanderguzhva Differential Revision: D44036091 fbshipit-source-id: d9d7c089c4d172b66f33dc968c00713a1b79c2d1	2023-03-24 13:54:08 -07:00
Matthijs Douze	0200d131fc	fix windows test (#2775 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2775 Reviewed By: algoriddle Differential Revision: D44210010 fbshipit-source-id: b9b620a4b0a874e09ee2f6082ff0f9463716fdf4	2023-03-21 05:34:50 -07:00
Matthijs Douze	2d7dd5b0a6	support checkpointing in big batch search Summary: Big batch search can be running for hours so it's useful to have a checkpointing mechanism in case it's run on a best-effort cluster queue. Reviewed By: algoriddle Differential Revision: D44059758 fbshipit-source-id: 5cb5e80800c6d2bf76d9f6cb40736009cd5d4b8e	2023-03-14 11:11:50 -07:00
Iurii Makarov	8ccd800986	Implemented Jaccard distance (#2684 ) Summary: Summary Implemented Jaccard distance requested in this issue: https://github.com/facebookresearch/faiss/issues/1299 Test plan Run: make -C build test Output: 100% tests passed, 0 tests failed out of 174 Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2684 Reviewed By: mdouze Differential Revision: D43398833 Pulled By: lvoursl fbshipit-source-id: b38cf27a7858842efe967bcb1033977863716a76	2023-02-27 07:49:42 -08:00
Matthijs Douze	a80c96c0de	Evaluation script for hybrid CPU / GPU search Summary: Implementation of various combinations of coarse quantization / scaning code on CPU and GPU. Used to generate the results of https://github.com/facebookresearch/faiss/wiki/Hybrid-CPU-GPU-search-and-multiple-GPUs Reviewed By: alexanderguzhva Differential Revision: D43041802 fbshipit-source-id: 12608812ab351d60d4a6dc45be1ca493f76d4375	2023-02-15 12:55:06 -08:00
Matthijs Douze	8fc3775472	building blocks for hybrid CPU / GPU search (#2638 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2638 This diff is a more streamlined way of searching IVF indexes with precomputed clusters. This will be used for experiments with hybrid CPU / GPU search. Reviewed By: algoriddle Differential Revision: D41301032 fbshipit-source-id: a1d645fd0f2bf806454dfd04971edc0a6200d20d	2023-01-12 13:34:44 -08:00
Jeff Johnson	590f6fb47d	Faiss pytorch bridge: revert to TypedStorage (#2631 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2631 The pytorch in fbcode complains about `storage()` saying it is deprecated and we need to move to UntypedStorage `_storage()`, while github CI is using an older version of pytorch where `_storage()` doesn't exist. As it is only a warning not an error in fbcode, revert to the old form, but we'll likely have to change to `_storage()` eventually. Reviewed By: alexanderguzhva Differential Revision: D42107029 fbshipit-source-id: 699c15932e6ae48cd1c60ebb7212dcd9b47626f6	2022-12-16 15:38:08 -08:00
Jeff Johnson	4bb7aa4b77	Faiss + Torch fixes, re-enable k = 2048 Summary: This diff fixes four separate issues: - Using the pytorch bridge produces the following deprecation warning. We switch to `_storage()` instead. ``` torch_utils.py:51: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor._storage() instead of tensor.storage() x.storage().data_ptr() + x.storage_offset() * 4) ``` - The `storage_offset` for certain types was wrong, but this would only affect torch tensors that were a view into a storage that didn't begin at the beginning. - The `reconstruct_n` numpy pytorch bridge function allowed passing `-1` for `ni` which indicated that all vectors should be reconstructed. The torch bridge didn't follow this and throw an error: ``` TypeError: torch_replacement_reconstruct_n() missing 2 required positional arguments: 'n0' and 'ni' ``` - Choosing values in the range (1024, 2048] for `k` or `nprobe` were broken in D37777979; this is now fixed again. Reviewed By: alexanderguzhva Differential Revision: D42041239 fbshipit-source-id: c7d9b4aba63db8ac73e271c8ef34e231002963d9	2022-12-14 16:21:22 -08:00
Matthijs Douze	fa53e2c941	Implementation of big-batch IVF search (single machine) (#2567 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2567 Intuitively, it should be easier to handle big-batch searches because all distance computations for a set of queries can be done locally within each inverted list. This benchmark implements this in pure python (but should be close to optimal in terms of speed), on CPU for IndexIVFFlat, IndexIVFPQ and IndexIVFScalarQuantizer. GPU is also supported. The results are not systematically better, see https://docs.google.com/document/d/1d3YuV8uN7hut6aOATCOMx8Ut-QEl_oRnJdPgDBRF1QA/edit?usp=sharing Reviewed By: algoriddle Differential Revision: D41098338 fbshipit-source-id: 479e471b0d541f242d420f581775d57b708a61b8	2022-12-09 08:53:13 -08:00
Matthijs Douze	a996a4a052	Put idx_t in the faiss namespace (#2582 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2582 A few more or less cosmetic improvements * Index::idx_t was in the Index object, which does not make much sense, this diff moves it to faiss::idx_t * replace multiprocessing.dummy with multiprocessing.pool * add Alexandr as a core contributor of Faiss in the README ;-) ``` for i in $( find . -name \.cu -o -name \.cuh -o -name \.h -o -name \.cpp ) ; do sed -i s/Index::idx_t/idx_t/ $i done ``` For the fbcode deps: ``` for i in $( fbgs Index::idx_t --exclude fbcode/faiss -l ) ; do sed -i s/Index::idx_t/idx_t/ $i done ``` Reviewed By: algoriddle Differential Revision: D41437507 fbshipit-source-id: 8300f2a3ae97cace6172f3f14a9be3a83999fb89	2022-11-30 08:25:30 -08:00
Jeff Johnson	e3d12c7133	Faiss GPU: add device specifier for bfKnn (#2584 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2584 The `bfKnn` C++ function and `knn_gpu` Python functions for running brute-force k-NN on the GPU did not have a way to specify the GPU device on which the search should run, as it simply used the current thread-local `cudaGetDevice(...)` setting in the CUDA runtime API. This is unlike the GPU index classes which takes a device argument in the index config struct. Now, both the C++ and Python interface to bfKnn have an optional argument to specify the device. Default behavior is the current behavior; if the `device` is -1 then the current CUDA thread-local device is used, otherwise we perform the work on the desired device. Reviewed By: mdouze Differential Revision: D41448254 fbshipit-source-id: a63c68c12edbe4d725b9fc2a749d5dc935574e12	2022-11-21 18:20:32 -08:00
Matthijs Douze	dd814b5f14	IVF filtering based on IDSelector (no init split) (#2483 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2483 This diff changes the following: 1. all search functions now take a `SearchParameters` argument that overrides the internal search parameters 2. the default implementation for most classes throws when the params argument is non-nullptr / non-None 3. the IndexIVF and IndexHNSW classes have functioning SearchPArameters 4. the SearchParameters includes an IDSelector that can search only in a subset of the index based on a defined subset of ids There is also some refactoring: the IDSelector was moved to its own .h/.cpp and python/__init__.py is spit in parts. The diff is quite bulky because the search function prototypes need to be changed in all index classes. Things to fix in subsequent diffs: - support SearchParameters for more index types (Flat variants) - better sub-object ownership for SearchParams (with std::unique_ptr?) - special handling of IDSelectorRange to make it faster Reviewed By: alexanderguzhva Differential Revision: D39852589 fbshipit-source-id: 4988bdb5b9bee1207cd327d3f80bf5e0e2467fe1	2022-09-30 06:40:03 -07:00
François Travais	b74ac352d3	Fix RPC lib logging (#2433 ) Summary: I couldn't run the client-server implementation because of the logging. Indeed `LOG.info('Connected by', addr, end=' ')` raised an exception (`end` is not recognised as a valid argument). Other warnings are also showing up. This PR clean things up a bit and fixes the client-server. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2433 Reviewed By: alexanderguzhva Differential Revision: D39167576 Pulled By: mdouze fbshipit-source-id: 6f74d582f14e353e04029e6465bd6e488a865289	2022-09-08 01:05:52 -07:00
Ryan Russell	d2806286d2	docs: Improve readability (#2378 ) Summary: Signed-off-by: Ryan Russell <git@ryanrussell.org> Various readability fixes focused on `.md` files: - Grammar - Fix some incorrect command references to `distributed_kmeans.py` - Styling the markdown bash code snippets sections so they format Attempted to put a lot of little things into one PR and commit; let me know if any mods are needed! Best, Ryan Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2378 Reviewed By: alexanderguzhva Differential Revision: D37717671 Pulled By: mdouze fbshipit-source-id: 0039192901d98a083cd992e37f6b692d0572103a	2022-07-08 09:19:07 -07:00
Matthijs Douze	b8fe92dfee	contrib clustering module (#2217 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2217 This diff introduces a new Faiss contrib module that contains: - generic k-means implemented in python (was in distributed_ondisk) - the two-level clustering code, including a simple function that runs it on a Faiss IVF index. - sparse clustering code (new) The main idea is that that code is often re-used so better have it in contrib. Reviewed By: beauby Differential Revision: D34170932 fbshipit-source-id: cc297cc56d241b5ef421500ed410d8e2be0f1b77	2022-02-28 14:18:47 -08:00
Matthijs Douze	eb8781557f	Fix exhaustive search GT computation with IP distance (#2212 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2212 Fixes issue https://github.com/facebookresearch/faiss/issues/2205 clear bug report easy fix easy to accept ;-) Reviewed By: beauby Differential Revision: D33975281 fbshipit-source-id: 088e1f3078dc79402563be7fac3530d76b197006	2022-02-07 19:36:21 -08:00
Ben Mann	30abcd6a86	Add assertion to merge_ondisk.py (#2190 ) Summary: Fixes https://github.com/facebookresearch/faiss/issues/2188 Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2190 Reviewed By: beauby Differential Revision: D33975889 Pulled By: mdouze fbshipit-source-id: 364eeac8de02f3ae00c9676198ed2ce27cfcd12b	2022-02-03 05:14:22 -08:00
Matthijs Douze	c0052c1533	IndexFlatCodes: a single parent for all flat codecs (#2132 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2132 This diff adds the class IndexFlatCodes that becomes the parent of all "flat" encodings. IndexPQ IndexFlat IndexAdditiveQuantizer IndexScalarQuantizer IndexLSH Index2Layer The other changes are: - for IndexFlat, there is no vector<float> with the data anymore. It is replaced with a `get_xb()` function. This broke quite a few external codes, that this diff also attempts to fix. - I/O functions needed to be adapted. This is done without changing the I/O format for any index. - added a small contrib function to get the data from the IndexFlat - the functionality has been made uniform, for example remove_ids and add are now in the parent class. Eventually, we may support generic storage for flat indexes, similar to `InvertedLists`, eg to memmap the data, but this will again require a big change. Reviewed By: wickedfoo Differential Revision: D32646769 fbshipit-source-id: 04a1659173fd51b130ae45d345176b72183cae40	2021-12-07 01:31:07 -08:00
Matthijs Douze	bef12cf51b	Implement LCC's RCQ + ITQ in Faiss (#2123 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2123 One of the encodings used by LCC is based on a RCQ coarse quantizer and a "payload" of ITQ. The codes are compared with Hamming distances. The index type `IndexIVFSpectralHash` can be re-purposed to perfrorm this type of index. This diff contains a small demo demo_rcq_itq script in python to show how: * the RCQ + ITQ are trained * the RCQ + ITQ index add and search work (with a very inefficient python implementation) * they can be transferred to an `IndexIVFSpectralHash` * the python implementation and `IndexIVFSpectralHash` give the same results The advantage of using to an `IndexIVFSpectralHash` is that in C++ it offers an `InvertedListScanner` object that can be used to compute query to code distances with its `distance_to_code` method. This is generic and will generalize to other types of encodings and coarse quantizers. What is missing is an index_factory to make instanciation easier. Reviewed By: sc268 Differential Revision: D32642900 fbshipit-source-id: 284f3029d239b7946bbca44a748def4e058489bd	2021-11-25 15:59:18 -08:00
Lucas Hosseini	b4eb51dae8	Change default branch references from master to main. (#2029 ) Summary: This is required for the renaming of the default branch from `master` to `main`, in accordance with the new Facebook OSS guidelines. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2029 Reviewed By: mdouze Differential Revision: D30672862 Pulled By: beauby fbshipit-source-id: 0b6458a4ff02a12aae14cf94057e85fdcbcbff96	2021-09-01 09:26:20 -07:00
Matthijs Douze	760cce7f3a	Support for additive quantizer search (#1961 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1961 This diff implements LUT-based search for additive quantizers. It also further merges code for LSQ and the RedisualQuantizer. The documentation + evaluation is on github: https://github.com/facebookresearch/faiss/wiki/Additive-quantizers Reviewed By: wickedfoo Differential Revision: D29395079 fbshipit-source-id: b8a24a647bbdc4cda2a699e791ffdb2a12bfa9c6	2021-08-20 01:00:10 -07:00
Check Deng	48ae55348a	Update codebooks with double type (#1975 ) Summary: ## Description The process of updating the codebook in LSQ may be unstable if the data is not zero-centering. This diff fixed it by using `double` instead of `float` during codebook updating. This would not affect the performance since the update process is quite fast. Users could switch back to `float` mode by setting `update_codebooks_with_double = False` ## Changes 1. Support `double` during codebook updating. 2. Add a unit test. 3. Add `__init__.py` under `contrib/` to avoid warnings. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1975 Reviewed By: wickedfoo Differential Revision: D29565632 Pulled By: mdouze fbshipit-source-id: 932d7932ae9725c299cd83f87495542703ad6654	2021-07-07 03:29:49 -07:00
Matthijs Douze	1829aa92a1	three small fixes (#1972 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1972 This fixes a few issues that I ran into + adds tests: - range_search_max_results with IP search - a few missing downcasts for VectorTRansforms - ResultHeap supports max IP search Reviewed By: wickedfoo Differential Revision: D29525093 fbshipit-source-id: d4ff0aff1d83af9717ff1aaa2fe3cda7b53019a3	2021-07-01 16:08:45 -07:00
Chengqi Deng	c087f87730	Add LocalSearchQuantizer (#1906 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1906 This PR implemented LSQ/LSQ++, a vector quantization technique described in the following two papers: 1. Revisiting additive quantization 2. LSQ++: Lower running time and higher recall in multi-codebook quantization Here is a benchmark running on SIFT1M for 64 bits encoding: ``` ===== lsq: mean square error = 17335.390208 training time: 312.729779958725 s encoding time: 244.6277096271515 s ===== pq: mean square error = 23743.004672 training time: 1.1610801219940186 s encoding time: 2.636141061782837 s ===== rq: mean square error = 20999.737344 training time: 31.813055515289307 s encoding time: 307.51959800720215 s ``` Changes: 1. Add LocalSearchQuantizer object 2. Fix an out of memory bug in ResidualQuantizer 3. Add a benchmark for evaluating quantizers 4. Add tests for LocalSearchQuantizer Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1862 Test Plan: ``` buck test //faiss/tests/:test_lsq buck run mode/opt //faiss/benchs/:bench_quantizer -- lsq pq rq ``` Reviewed By: beauby Differential Revision: D28376369 Pulled By: mdouze fbshipit-source-id: 2a394d38bf75b9de0a1c2cd6faddf7dd362a6fa8	2021-05-21 01:33:55 -07:00
Jeff Johnson	b544db24a8	Raw all-pairwise distance function on GPU Summary: This diff implements brute-force all-pairwise distances between two different sets of vectors using any of the Faiss supported metrics on the GPU (L2, IP, L1, Lp, Linf, etc). It is implemented using the same C++ interface as `bfKnn`, except when `k == -1`, all pairwise distances will be returned (no k-selection is made). A restriction exists at present where the entire output data must be able to reside on the same GPU which may be lifted at a subsequent point. This interface is available in python via `faiss.pairwise_distance_gpu(res, xq, xb, D, metric)` with both numpy and pytorch support which will return all of the distances in D. Also cleaned up CUDA stream usage a little bit in Distance.cu/Distance.cuh in the C++ implementation. Reviewed By: mdouze Differential Revision: D27686773 fbshipit-source-id: 8de6a699cda5d7077f0ab583e9ce76e630f0f687	2021-04-13 12:06:04 -07:00

1 2

67 Commits