faiss

mirror of https://github.com/facebookresearch/faiss.git synced 2025-06-03 21:54:02 +08:00

Author	SHA1	Message	Date
Di-Is	905963f344	Add `ngpu` default argument to `knn_ground_truth` (#4123 ) Summary: This pull request introduces a new default argument, `ngpu=-1`, to the `knn_ground_truth` function in the `faiss.contrib`. ## Purpose of Change ### Bug Fix In the current implementation, running tests under the tests directory (CPU tests) in an environment with faiss-gpu installed would inadvertently use the GPU and cause unintended behavior. This pull request prevents the GPU from being used during CPU-only tests by explicitly controlling GPU allocation via the ngpu parameter. ### API Consistency Other functions that call `faiss.get_num_gpus` in `faiss.contrib`, such as `range_search_max_results` and `range_ground_truth`, already include the `ngpu` argument. Adding this parameter to `knn_ground_truth` will ensure consistency across the API, reduce potential confusion, and improve ease of use. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4123 Reviewed By: asadoughi Differential Revision: D68199506 Pulled By: junjieqi fbshipit-source-id: cb50e206d8a1a982c21b0ccb42825ea45873f3ef	2025-01-21 11:09:02 -08:00
Maria Lomeli	9590ad2746	PQ with pytorch (#4116 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4116 This diff implements Product Quantization using Pytorch only. Reviewed By: mdouze Differential Revision: D67766798 fbshipit-source-id: fe2d44a674fc2056f7e2082e9765052c98fdc8f8	2025-01-06 09:48:32 -08:00
Jeff Johnson	eaab46c870	Faiss GPU: bfloat16 brute-force kNN support (#4018 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4018 Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4014 This diff adds support for bfloat16 vector/query data types with the GPU brute-force k-nearest neighbor function (`bfKnn`). The change is largely just plumbing the new data type through the template hierarchy (so distances can be computed in bfloat16). Of note, by design, all final distance results are produced in float32 regardless of input data type (float32, float16, bfloat16). This is because the true nearest neighbors in many data sets can often differ by only ~1000 float32 ULPs in terms of distance which will result in possible false equivalency. This seems to be one area where lossy compression/quantization thoughout does not work as well (and is also why `CUBLAS_MATH_DISALLOW_REDUCED_PRECISION_REDUCTION` is set in `StandardGpuResources.cpp`. However, given that there is native bf16 x bf16 = fp32 tensor core support on Ampere+ architectures, the matrix multiplication itself should use them. As bfloat16 support is quite lacking on AMD/ROCm (see [here](https://rocm.docs.amd.com/projects/HIPIFY/en/latest/tables/CUDA_Device_API_supported_by_HIP.html), very few bf16 functions implemented), bf16 functionality is completely disabled / not compiled for AMD ROCm. Reviewed By: mdouze Differential Revision: D65459723 fbshipit-source-id: 8a6aec843f7e37c205d95f2485442a26c402a3b0	2024-11-19 19:37:03 -08:00
Tarang Jain	134922061c	Migrate from RAFT to CUVS (#3549 ) Summary: Remove the dependency on `raft::compiled` and modify GPU implementations to use cuVS backend in place of RAFT. A deeper insight into the dependency: FAISS gets the ANN algorithm implementations such as IVF-Flat and IVF-PQ from cuVS. RAFT is meant to be a lightweight C++ header-only template library that cuVS relies on for the more fundamental / low-level utilities. Some examples of these are RAFT's device mdarray and mdspan objects; the RAFT resource object (`raft::resource`) that takes care of the stream ordering of device functions; linear algebra functions such as mapping, reduction, BLAS routines etc. A lot of the cuVS functions take the RAFT mdspan objects as arguments (for example `raft::device_matrix_view`). Therefore FAISS relies on both cuVS and RAFT. FAISS gets RAFT headers through cuVS and uses them to create the function arguments that can be consumed by cuVS. Note that we are not explicitly linking FAISS against `raft::raft` or `raft::compiled`. Only the required headers are included and compiled rather than compiling the whole RAFT shared library. This is the reason we still see mentions of `raft` in FAISS. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3549 Reviewed By: ramilbakhshyiev Differential Revision: D62041013 Pulled By: asadoughi fbshipit-source-id: 7230dcc06cf47baf95873adc1dec2adca4a8f82a	2024-11-14 11:30:16 -08:00
Michael Norris	eff0898a13	Enable linting: lint config changes plus arc lint command (#3966 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3966 This actually enables the linting. Manual changes: - tools/arcanist/lint/fbsource-licenselint-config.toml - tools/arcanist/lint/fbsource-lint-engine.toml Automated changes: `arc lint --apply-patches --take LICENSELINT --paths-cmd 'hg files faiss'` Reviewed By: asadoughi Differential Revision: D64484165 fbshipit-source-id: 4f2f6e953c94ef6ebfea8a5ae035ccfbea65ed04	2024-10-22 09:46:48 -07:00
Matthijs Douze	2e6551ffa3	Support search_preassigned in torch (#3916 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3916 Adding missing wrapper to the torch wrappers in Faiss + test it. Also factorized a bit of code between search functions. Reviewed By: algoriddle Differential Revision: D63974821 fbshipit-source-id: a0415a57a763e2d1896956c503e503615c167860	2024-10-08 02:46:10 -07:00
Matthijs Douze	838612c9d7	torch.distributed kmeans (#3876 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3876 Demo script for distributed kmeans. It provides a `DatasetAssign` object and shows how to run it with torch.distributed. Reviewed By: asadoughi, pankajsingh88 Differential Revision: D63013820 fbshipit-source-id: 22c959f3afdc04fd4aa8b9aeed309ea6290b1328	2024-09-20 09:15:27 -07:00
Matthijs Douze	6baebe2cee	begin torch_contrib (#3872 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3872 The contrib.torch subdirectory is intended to receive modules in python that are useful for similarity search and that apply to CPU or GPU pytorch tensors. The current version includes CPU clustering on torch tensors. To be added: * implementation of PQ Reviewed By: asadoughi Differential Revision: D62759207 fbshipit-source-id: 87dbaa5083e3f2f4f60526815e22ded4e83e8559	2024-09-20 09:15:27 -07:00
Matthijs Douze	0d7817e88f	rewrite python kmeans without scipy (#3873 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3873 The previous version required scipy to do the accumulation, which is replaced here with a nifty piece of numpy accumulation. This removes the need for scipy for non-sparse data. Reviewed By: junjieqi Differential Revision: D62884307 fbshipit-source-id: 5443634e487387a2b518fd2a7f9a3d9a40abd4b4	2024-09-20 09:15:27 -07:00
Kumar Saurabh Arora	a166e13a25	Adding bucket/path (blobstore) in dataset descriptor (#3848 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3848 same as title. Dataset can be referred from blobstore Reviewed By: satymish Differential Revision: D62476993 fbshipit-source-id: db2b4088ab6e02278b8b91194bf916fc476b79ec	2024-09-11 20:01:04 -07:00
Matthijs Douze	ab109c28d5	Add search functionality to FlatCodes (#3611 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3611 Using the new dispatcher functions, add search func to flat codes. To test it, make IndexLattice a subclass of FlatCodes and check the resonstruction there. Reviewed By: asadoughi Differential Revision: D59367989 fbshipit-source-id: 405dab4358fe34b2e38ac8bcc222b19f58643229	2024-07-11 02:40:38 -07:00
Gergely Szilvasy	3d32330e3d	add use_raft to knn_gpu (torch) (#3509 ) Summary: Add support for `use_raft` in the torch version of `knn_gpu`. The numpy version already has this support, see https://github.com/facebookresearch/faiss/blob/main/faiss/python/gpu_wrappers.py#L59 Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3509 Reviewed By: mlomeli1, junjieqi Differential Revision: D58489851 Pulled By: algoriddle fbshipit-source-id: cfad722fefd4809b135b765d0d43587cfd782d0e	2024-06-12 19:19:23 -07:00
Kumar Saurabh Arora	22304340d2	Adding buck target for experiment bench_fw_ivf (#3423 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3423 Adding small fixes to run experiments from fbcode. 1. Added buck target 2. Full import path of faiss bench_fw modules 3. new dataset path to run tests locally as we can't use an existing directory ./data in fbcode. Reviewed By: algoriddle, junjieqi Differential Revision: D57235092 fbshipit-source-id: f78a23199e619b640a19ca37f8b52ff0abdd8298	2024-05-31 14:30:39 -07:00
Kumar Saurabh Arora	0beecb4c85	sys.big_endian to sys.byteorder (#3422 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3422 Found vec_io failing when running some benchmarking. There is no such field named big_endian in sys. So, reverting it to original field byteorder Reviewed By: algoriddle Differential Revision: D56718607 fbshipit-source-id: 553f1d2d6bc967581142a92282e534f3f164e8f9	2024-05-30 09:27:55 -07:00
Alexandr Guzhva	6a94c67a2f	QT_bf16 for scalar quantizer for bfloat16 (#3444 ) Summary: mdouze Please let me know if any additional unit tests are needed Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3444 Reviewed By: algoriddle Differential Revision: D57665641 Pulled By: mdouze fbshipit-source-id: 9bec91306a1c31ea4f1f1d726c9d60ac6415fdfc	2024-05-23 02:59:15 -07:00
Matthijs Douze	783e044a2d	support big-endian machines (#3361 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3361 Fix a few issues in the PR. Normally all tests should pass on a litlle-endian machine Reviewed By: junjieqi Differential Revision: D56003181 fbshipit-source-id: 405dec8c71898494f5ddcd2718c35708a1abf9cb	2024-04-24 05:40:49 -07:00
Aditya Vidyadhar Kamath	67574aabbc	Fix the endianness issue in AIX while running the benchmark. (#3345 ) Summary: This pull request is for issue https://github.com/facebookresearch/faiss/issues/3330. This patch makes sure that packed code arrays are in big endian format. Kindly let us know if we need any changes or if we can have a better approach. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3345 Reviewed By: junjieqi Differential Revision: D55957630 Pulled By: mdouze fbshipit-source-id: f728f9563f6b942af9d8899b54662d7ceb811206	2024-04-24 05:40:49 -07:00
Kumar Saurabh Arora	da9f292a4b	Support of skip_ids in merge_from_multiple function of OnDiskInvertedLists (#3327 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3327 Context 1. [Issue 2621](https://github.com/facebookresearch/faiss/issues/2621) discuss inconsistency between OnDiskInvertedList and InvertedList. OnDiskInvertedList is supposed to handle disk based multiple Index Shards. Thus, we should name it differently when merging invls from index shard. 2. [Issue 2876](https://github.com/facebookresearch/faiss/issues/2876) provides usecase of shifting ids when merging invls from different shards. In this diff, 1. To address #1 above, I renamed the merge_from function to merge_from_multiple without touching merge_from base class. why so? To continue to allow merge invl from one index to ondiskinvl from other index. 2. To address #2 above, I have added support of shift_ids in merge_from_multiple to shift ids from different shards. This can be used when each shard has same set of ids but different data. This is not recommended if id is already unique across shards. Reviewed By: mdouze Differential Revision: D55482518 fbshipit-source-id: 95470c7449160488d2b45b024d134cbc037a2083	2024-04-03 10:36:56 -07:00
Ramil Bakhshyiev	8274c38f27	Remove TypedStorage usage when working with torch_utils (#3301 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3301 In `torch_utils.py`, changed `storage()' references to `untyped_storage()`. Reviewed By: junjieqi Differential Revision: D55167842 fbshipit-source-id: 911eda1c22f10595663fb4416ab992903390d457	2024-03-21 10:30:44 -07:00
Maria Lomeli	420d25f51c	Index pretransform support in search_preassigned (#3225 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3225 This diff fixes issue [#3113](https://github.com/facebookresearch/faiss/issues/3113), e.g. introduces support for index pretransform in `search_preassigned`. Reviewed By: mdouze Differential Revision: D53188584 fbshipit-source-id: 8189c0a59f957a2606391f22cf3fdc8874110a6e	2024-01-30 09:20:07 -08:00
Gergely Szilvasy	c4b91a54d1	Replace pickle serialization to address security vulnerability Summary: This diff replaces the use of pickle serialization with json to address a security vulnerability. Adding a warning message that this code is for demonstration purposes only. Reviewed By: mdouze Differential Revision: D52777650 fbshipit-source-id: d9d6a00fd341b29ac854adcbf675d2cd303d2f29	2024-01-25 07:24:04 -08:00
Matthijs Douze	32f0e8cf92	Generalize ResultHanlder, support range search for HNSW and Fast Scan (#3190 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3190 This diff adds more result handlers in order to expose them externally. This enables range search for HSNW and Fast Scan, and nprobe parameter support for FastScan. Reviewed By: pemazare Differential Revision: D52547384 fbshipit-source-id: 271da5ffea6411df3d8e50641abade18bd7b774b	2024-01-11 11:46:30 -08:00
Gergely Szilvasy	beef6107fc	faiss paper benchmarks (#3189 ) Summary: - IVF benchmarks: `bench_fw_ivf.py bench_fw_ivf.py bigann /checkpoint/gsz/bench_fw/ivf` - Codec benchmarks: `bench_fw_codecs.py contriever /checkpoint/gsz/bench_fw/codecs` and `bench_fw_codecs.py deep1b /checkpoint/gsz/bench_fw/codecs` - A range codec evaluation: `bench_fw_range.py ssnpp /checkpoint/gsz/bench_fw/range` - Visualize with `bench_fw_notebook.ipynb` - Support for running on a cluster Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3189 Reviewed By: mdouze Differential Revision: D52544642 Pulled By: algoriddle fbshipit-source-id: 21dcdfd076aef6d36467c908e6be78ef851b0e98	2024-01-05 09:27:04 -08:00
Maria Lomeli	be1242775a	Upstream changes to big batch search (#3170 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3170 Logging info, adding to heap and wait_in and out times. Reviewed By: algoriddle Differential Revision: D52034667 fbshipit-source-id: 8ab864c5c43d534d094c6e81bb810c74e20c9ac2	2023-12-12 09:51:05 -08:00
Gergely Szilvasy	9519a19f42	benchmark refactor Summary: 1. Support for index construction parameters outside of the factory string (arbitrary depth of quantizers). 2. Refactor that provides an index wrapper which is a prereq for the optimizer, which will generate indices from pre-optimized components (particularly quantizers) Reviewed By: mdouze Differential Revision: D51427452 fbshipit-source-id: 014d05dd798d856360f2546963e7cad64c2fcaeb	2023-12-04 05:53:17 -08:00
Matthijs Douze	b109d086a2	Search and return codes (#3143 ) Summary: This PR adds a functionality where an IVF index can be searched and the corresponding codes be returned. It also adds a few functions to compress int arrays into a bit-compact representation. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3143 Test Plan: ``` buck test //faiss/tests/:test_index_composite -- TestSearchAndReconstruct buck test //faiss/tests/:test_standalone_codec -- test_arrays ``` Reviewed By: algoriddle Differential Revision: D51544613 Pulled By: mdouze fbshipit-source-id: 875f72d0f9140096851592422570efa0f65431fc	2023-11-25 13:57:25 -08:00
Matthijs Douze	9db182460c	Relax IVFFlatDedup test (#3077 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3077 This diff relaxes some IVFFlatDedup tests where distances are slighlty different over runs. Should fix https://app.circleci.com/pipelines/github/facebookresearch/faiss/4709/workflows/8c8213bf-8fe0-4c4e-9a7d-991f44bf1010/jobs/25551 https://app.circleci.com/pipelines/github/facebookresearch/faiss/4709/workflows/8c8213bf-8fe0-4c4e-9a7d-991f44bf1010/jobs/25547 Reviewed By: algoriddle Differential Revision: D49732349 fbshipit-source-id: 728b9885c6b7d6ba697ccb6bacc0abd0ee2b0679	2023-09-29 01:16:59 -07:00
Matthijs Douze	687457b2f4	Access graph structure for NSG (#2984 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2984 It is not entirely trivial to access the NSG graph structure from Python (although it is a fixed size N-by-K matrix of vector ids). This diff adds an inspect_tools function to do that. Reviewed By: algoriddle Differential Revision: D48026775 fbshipit-source-id: 94cd7be7f656bcd333d62586531f287ea8e052e5	2023-08-04 06:55:24 -07:00
Matthijs Douze	07fe2b622f	Binary cloning and GPU range search (#2916 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2916 Overall better support for binary indexes: - cloning (to CPU and GPU), only for BinaryFlat for now - fix bug in reconstruct_n - range_search_max_results Reviewed By: algoriddle Differential Revision: D46755778 fbshipit-source-id: 777ad90aff5c54a77f9685ed6512247a922c6ef5	2023-06-19 06:05:14 -07:00
Gergely Szilvasy	092606b293	bbs producer/consumer threading (#2901 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2901 This diff allows each GPU to work independently, a hot centroid (eg. out-of-distribution queries that hit a centroid heavily) will only block the one GPU that is processing it, others will continue to pick up work independently. Reviewed By: mdouze Differential Revision: D46521298 fbshipit-source-id: 171cb06cce8b2d16b7bd744799b105b3cd525be3	2023-06-14 07:58:44 -07:00
Matthijs Douze	6800ebef83	Support independent IVF coarse quantizer Summary: In the IndexIVFIndepenentQuantizer, the coarse quantizer is applied on the input vectors, but the encoding is performed on a vector-transformed version of the database elements. Reviewed By: alexanderguzhva Differential Revision: D45950970 fbshipit-source-id: 30f6cf46d44174b1d99a12384b7d5e2d475c1f88	2023-05-26 02:59:01 -07:00
Matthijs Douze	b9ea339617	support range search from GPU (#2860 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2860 Optimized range search function where the GPU computes by default and falls back on gpu for queries where there are too many results. Parallelize the CPU to GPU cloning, it seems to work. Support range_search_preassigned in Python Fix long-standing issue with SWIG exposed functions that did not release the GIL (in particular the MapLong2Long). Adds a MapInt64ToInt64 that is more efficient than MapLong2Long. Reviewed By: algoriddle Differential Revision: D45672301 fbshipit-source-id: 2e77397c40083818584dbafa5427149359a2abfd	2023-05-16 00:27:53 -07:00
Matthijs Douze	2d8886cd4f	IVF sorting routine (#2846 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2846 Adds a function to ivf_contrib to sort the inverted lists by size without changing the results. Also moves big_batch_search to its own module. Reviewed By: algoriddle Differential Revision: D45565880 fbshipit-source-id: 091a1c1c074f860d6953bf20d04523292fb55e1a	2023-05-04 09:59:06 -07:00
Matthijs Douze	3704bbe4a7	Add GIST1M to datasets Summary: GIST1M is on the fair cluster but was not added to the datsets.py Reviewed By: alexanderguzhva Differential Revision: D45276664 fbshipit-source-id: 8db41d61b78983f5d01dedca1790618f80f6bc78	2023-04-26 02:07:11 -07:00
Matthijs Douze	016aa04602	make balanced clusters the default (#2796 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2796 This diff makes balanced clusters the default for 2-level clustering. This seems to improve a bit over the default uniform clusters, see https://github.com/fairinternal/faiss_improvements/blob/master/better_coarse_quantizer/two_level_clustering.ipynb Warning: the nc2 argument of two_level_clustering becomes the total number of clusters. Reviewed By: algoriddle Differential Revision: D44421222 fbshipit-source-id: 951b7fc043be4a41762a7e6f7a6fcfb71e303832	2023-03-28 07:23:30 -07:00
Matthijs Douze	581760302f	evaluation script + IVF block search (#2781 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2781 This is a benchmarking script for keypoint matching with labelled ground-truth. Reviewed By: alexanderguzhva Differential Revision: D44036091 fbshipit-source-id: d9d7c089c4d172b66f33dc968c00713a1b79c2d1	2023-03-24 13:54:08 -07:00
Matthijs Douze	0200d131fc	fix windows test (#2775 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2775 Reviewed By: algoriddle Differential Revision: D44210010 fbshipit-source-id: b9b620a4b0a874e09ee2f6082ff0f9463716fdf4	2023-03-21 05:34:50 -07:00
Matthijs Douze	2d7dd5b0a6	support checkpointing in big batch search Summary: Big batch search can be running for hours so it's useful to have a checkpointing mechanism in case it's run on a best-effort cluster queue. Reviewed By: algoriddle Differential Revision: D44059758 fbshipit-source-id: 5cb5e80800c6d2bf76d9f6cb40736009cd5d4b8e	2023-03-14 11:11:50 -07:00
Iurii Makarov	8ccd800986	Implemented Jaccard distance (#2684 ) Summary: Summary Implemented Jaccard distance requested in this issue: https://github.com/facebookresearch/faiss/issues/1299 Test plan Run: make -C build test Output: 100% tests passed, 0 tests failed out of 174 Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2684 Reviewed By: mdouze Differential Revision: D43398833 Pulled By: lvoursl fbshipit-source-id: b38cf27a7858842efe967bcb1033977863716a76	2023-02-27 07:49:42 -08:00
Matthijs Douze	a80c96c0de	Evaluation script for hybrid CPU / GPU search Summary: Implementation of various combinations of coarse quantization / scaning code on CPU and GPU. Used to generate the results of https://github.com/facebookresearch/faiss/wiki/Hybrid-CPU-GPU-search-and-multiple-GPUs Reviewed By: alexanderguzhva Differential Revision: D43041802 fbshipit-source-id: 12608812ab351d60d4a6dc45be1ca493f76d4375	2023-02-15 12:55:06 -08:00
Matthijs Douze	8fc3775472	building blocks for hybrid CPU / GPU search (#2638 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2638 This diff is a more streamlined way of searching IVF indexes with precomputed clusters. This will be used for experiments with hybrid CPU / GPU search. Reviewed By: algoriddle Differential Revision: D41301032 fbshipit-source-id: a1d645fd0f2bf806454dfd04971edc0a6200d20d	2023-01-12 13:34:44 -08:00
Jeff Johnson	590f6fb47d	Faiss pytorch bridge: revert to TypedStorage (#2631 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2631 The pytorch in fbcode complains about `storage()` saying it is deprecated and we need to move to UntypedStorage `_storage()`, while github CI is using an older version of pytorch where `_storage()` doesn't exist. As it is only a warning not an error in fbcode, revert to the old form, but we'll likely have to change to `_storage()` eventually. Reviewed By: alexanderguzhva Differential Revision: D42107029 fbshipit-source-id: 699c15932e6ae48cd1c60ebb7212dcd9b47626f6	2022-12-16 15:38:08 -08:00
Jeff Johnson	4bb7aa4b77	Faiss + Torch fixes, re-enable k = 2048 Summary: This diff fixes four separate issues: - Using the pytorch bridge produces the following deprecation warning. We switch to `_storage()` instead. ``` torch_utils.py:51: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor._storage() instead of tensor.storage() x.storage().data_ptr() + x.storage_offset() * 4) ``` - The `storage_offset` for certain types was wrong, but this would only affect torch tensors that were a view into a storage that didn't begin at the beginning. - The `reconstruct_n` numpy pytorch bridge function allowed passing `-1` for `ni` which indicated that all vectors should be reconstructed. The torch bridge didn't follow this and throw an error: ``` TypeError: torch_replacement_reconstruct_n() missing 2 required positional arguments: 'n0' and 'ni' ``` - Choosing values in the range (1024, 2048] for `k` or `nprobe` were broken in D37777979; this is now fixed again. Reviewed By: alexanderguzhva Differential Revision: D42041239 fbshipit-source-id: c7d9b4aba63db8ac73e271c8ef34e231002963d9	2022-12-14 16:21:22 -08:00
Matthijs Douze	fa53e2c941	Implementation of big-batch IVF search (single machine) (#2567 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2567 Intuitively, it should be easier to handle big-batch searches because all distance computations for a set of queries can be done locally within each inverted list. This benchmark implements this in pure python (but should be close to optimal in terms of speed), on CPU for IndexIVFFlat, IndexIVFPQ and IndexIVFScalarQuantizer. GPU is also supported. The results are not systematically better, see https://docs.google.com/document/d/1d3YuV8uN7hut6aOATCOMx8Ut-QEl_oRnJdPgDBRF1QA/edit?usp=sharing Reviewed By: algoriddle Differential Revision: D41098338 fbshipit-source-id: 479e471b0d541f242d420f581775d57b708a61b8	2022-12-09 08:53:13 -08:00
Matthijs Douze	a996a4a052	Put idx_t in the faiss namespace (#2582 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2582 A few more or less cosmetic improvements * Index::idx_t was in the Index object, which does not make much sense, this diff moves it to faiss::idx_t * replace multiprocessing.dummy with multiprocessing.pool * add Alexandr as a core contributor of Faiss in the README ;-) ``` for i in $( find . -name \.cu -o -name \.cuh -o -name \.h -o -name \.cpp ) ; do sed -i s/Index::idx_t/idx_t/ $i done ``` For the fbcode deps: ``` for i in $( fbgs Index::idx_t --exclude fbcode/faiss -l ) ; do sed -i s/Index::idx_t/idx_t/ $i done ``` Reviewed By: algoriddle Differential Revision: D41437507 fbshipit-source-id: 8300f2a3ae97cace6172f3f14a9be3a83999fb89	2022-11-30 08:25:30 -08:00
Jeff Johnson	e3d12c7133	Faiss GPU: add device specifier for bfKnn (#2584 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2584 The `bfKnn` C++ function and `knn_gpu` Python functions for running brute-force k-NN on the GPU did not have a way to specify the GPU device on which the search should run, as it simply used the current thread-local `cudaGetDevice(...)` setting in the CUDA runtime API. This is unlike the GPU index classes which takes a device argument in the index config struct. Now, both the C++ and Python interface to bfKnn have an optional argument to specify the device. Default behavior is the current behavior; if the `device` is -1 then the current CUDA thread-local device is used, otherwise we perform the work on the desired device. Reviewed By: mdouze Differential Revision: D41448254 fbshipit-source-id: a63c68c12edbe4d725b9fc2a749d5dc935574e12	2022-11-21 18:20:32 -08:00
Matthijs Douze	dd814b5f14	IVF filtering based on IDSelector (no init split) (#2483 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2483 This diff changes the following: 1. all search functions now take a `SearchParameters` argument that overrides the internal search parameters 2. the default implementation for most classes throws when the params argument is non-nullptr / non-None 3. the IndexIVF and IndexHNSW classes have functioning SearchPArameters 4. the SearchParameters includes an IDSelector that can search only in a subset of the index based on a defined subset of ids There is also some refactoring: the IDSelector was moved to its own .h/.cpp and python/__init__.py is spit in parts. The diff is quite bulky because the search function prototypes need to be changed in all index classes. Things to fix in subsequent diffs: - support SearchParameters for more index types (Flat variants) - better sub-object ownership for SearchParams (with std::unique_ptr?) - special handling of IDSelectorRange to make it faster Reviewed By: alexanderguzhva Differential Revision: D39852589 fbshipit-source-id: 4988bdb5b9bee1207cd327d3f80bf5e0e2467fe1	2022-09-30 06:40:03 -07:00
François Travais	b74ac352d3	Fix RPC lib logging (#2433 ) Summary: I couldn't run the client-server implementation because of the logging. Indeed `LOG.info('Connected by', addr, end=' ')` raised an exception (`end` is not recognised as a valid argument). Other warnings are also showing up. This PR clean things up a bit and fixes the client-server. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2433 Reviewed By: alexanderguzhva Differential Revision: D39167576 Pulled By: mdouze fbshipit-source-id: 6f74d582f14e353e04029e6465bd6e488a865289	2022-09-08 01:05:52 -07:00
Ryan Russell	d2806286d2	docs: Improve readability (#2378 ) Summary: Signed-off-by: Ryan Russell <git@ryanrussell.org> Various readability fixes focused on `.md` files: - Grammar - Fix some incorrect command references to `distributed_kmeans.py` - Styling the markdown bash code snippets sections so they format Attempted to put a lot of little things into one PR and commit; let me know if any mods are needed! Best, Ryan Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2378 Reviewed By: alexanderguzhva Differential Revision: D37717671 Pulled By: mdouze fbshipit-source-id: 0039192901d98a083cd992e37f6b692d0572103a	2022-07-08 09:19:07 -07:00
Matthijs Douze	b8fe92dfee	contrib clustering module (#2217 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2217 This diff introduces a new Faiss contrib module that contains: - generic k-means implemented in python (was in distributed_ondisk) - the two-level clustering code, including a simple function that runs it on a Faiss IVF index. - sparse clustering code (new) The main idea is that that code is often re-used so better have it in contrib. Reviewed By: beauby Differential Revision: D34170932 fbshipit-source-id: cc297cc56d241b5ef421500ed410d8e2be0f1b77	2022-02-28 14:18:47 -08:00

1 2

77 Commits