faiss

mirror of https://github.com/facebookresearch/faiss.git synced 2025-06-03 21:54:02 +08:00

Author	SHA1	Message	Date
Kumar Saurabh Arora	2379b45f82	Few fixes in bench_fw to enable IndexFromCodec (#3383 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3383 In this diff, I am fixing minor issues in bench_fw where either certain fields are not accessible when index is build from codec. It also requires index to be discovered using codec alias as index factory is not always available. In subsequent diff internal to meta will have testcase that execute this path. Reviewed By: algoriddle Differential Revision: D56444641 fbshipit-source-id: b7af7e7bb47b20bbb5515a66f41dd24f42459d52	2024-04-24 09:42:05 -07:00
Andres Suarez	ab2b7f5093	Apply clang-format 18 Summary: Previously this code conformed from clang-format 12. Reviewed By: igorsugak Differential Revision: D56065247 fbshipit-source-id: f5a985dd8f8b84f2f9e1818b3719b43c5a1b05b3	2024-04-14 11:28:32 -07:00
Matthijs Douze	fa1f39ec9f	Fix HNSW stats (#3309 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3309 Make sure that the HNSW search stats work, remove stats for deprecated functionality. Remove code of the link and code paper that is not supported anymore. Reviewed By: kuarora, junjieqi Differential Revision: D55247802 fbshipit-source-id: 03f176be092bff6b2db359cc956905d8646ea702	2024-03-22 12:55:30 -07:00
Tarang Jain	27b1055cc6	Integrate IVF-PQ from RAFT (#3044 ) Summary: Imports changes from https://github.com/facebookresearch/faiss/issues/3133 and https://github.com/facebookresearch/faiss/issues/3171. So this single PR adds all the changes together. - [x] Implement RaftIVFPQ class - [x] Update gtests to test correctness with RAFT enabled - [x] All googleTests for RAFT enabled IVFPQ pass - [x] Move some common functions in RaftIVFFlat and RaftIVFPQ to helper: RaftUtils.h - [x] update Quantizer retroactively after building RAFT index -- both IVFFlat and IVFPQ - [x] resolve failing LargeBatch (classical GPU) - [x] add checks for Pascal deprecation - [x] apply RMM changes from https://github.com/facebookresearch/faiss/issues/3171 - [x] apply robertmaynard's changes from https://github.com/facebookresearch/faiss/issues/3133 Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3044 Reviewed By: junjieqi Differential Revision: D51074065 Pulled By: algoriddle fbshipit-source-id: 6871257921bcaff2064a20637e2ed358acbdc363	2024-02-21 06:41:08 -08:00
Gergely Szilvasy	1d0e8d489f	index optimizer (#3154 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3154 Using the benchmark to find Pareto optimal indices, in this case on BigANN as an example. Separately optimize the coarse quantizer and the vector codec and use Pareto optimal configurations to construct IVF indices, which are then retested at various scales. See `optimize()` in `optimize.py` as the main function driving the process. The results can be interpreted with `bench_fw_notebook.ipynb`, which allows: * filtering by maximum code size * maximum time * minimum accuracy * space or time Pareto optimal options * and visualize the results and output them as a table. This version is intentionally limited to IVF(Flat\|HNSW),PQ\|SQ indices... Reviewed By: mdouze Differential Revision: D51781670 fbshipit-source-id: 2c0f800d374ea845255934f519cc28095c00a51f	2024-01-30 10:58:13 -08:00
Matthijs Douze	32f0e8cf92	Generalize ResultHanlder, support range search for HNSW and Fast Scan (#3190 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3190 This diff adds more result handlers in order to expose them externally. This enables range search for HSNW and Fast Scan, and nprobe parameter support for FastScan. Reviewed By: pemazare Differential Revision: D52547384 fbshipit-source-id: 271da5ffea6411df3d8e50641abade18bd7b774b	2024-01-11 11:46:30 -08:00
Gergely Szilvasy	beef6107fc	faiss paper benchmarks (#3189 ) Summary: - IVF benchmarks: `bench_fw_ivf.py bench_fw_ivf.py bigann /checkpoint/gsz/bench_fw/ivf` - Codec benchmarks: `bench_fw_codecs.py contriever /checkpoint/gsz/bench_fw/codecs` and `bench_fw_codecs.py deep1b /checkpoint/gsz/bench_fw/codecs` - A range codec evaluation: `bench_fw_range.py ssnpp /checkpoint/gsz/bench_fw/range` - Visualize with `bench_fw_notebook.ipynb` - Support for running on a cluster Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3189 Reviewed By: mdouze Differential Revision: D52544642 Pulled By: algoriddle fbshipit-source-id: 21dcdfd076aef6d36467c908e6be78ef851b0e98	2024-01-05 09:27:04 -08:00
Gergely Szilvasy	4c83965d2b	benchmark view results (#3144 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3144 Visualize results of running the benchmark with Pareto optima filtering: 1. per index or across indices 2. for space, time or space & time 3. knn or range search, the latter @ specific precision Reviewed By: mdouze Differential Revision: D51552775 fbshipit-source-id: d4f29e3d46ef044e71b54439b3972548c86af5a7	2023-12-04 05:53:17 -08:00
Gergely Szilvasy	9519a19f42	benchmark refactor Summary: 1. Support for index construction parameters outside of the factory string (arbitrary depth of quantizers). 2. Refactor that provides an index wrapper which is a prereq for the optimizer, which will generate indices from pre-optimized components (particularly quantizers) Reviewed By: mdouze Differential Revision: D51427452 fbshipit-source-id: 014d05dd798d856360f2546963e7cad64c2fcaeb	2023-12-04 05:53:17 -08:00
Matthijs Douze	b109d086a2	Search and return codes (#3143 ) Summary: This PR adds a functionality where an IVF index can be searched and the corresponding codes be returned. It also adds a few functions to compress int arrays into a bit-compact representation. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3143 Test Plan: ``` buck test //faiss/tests/:test_index_composite -- TestSearchAndReconstruct buck test //faiss/tests/:test_standalone_codec -- test_arrays ``` Reviewed By: algoriddle Differential Revision: D51544613 Pulled By: mdouze fbshipit-source-id: 875f72d0f9140096851592422570efa0f65431fc	2023-11-25 13:57:25 -08:00
Gergely Szilvasy	c3b9374984	bench_fw - fixes & nits for oss (#3102 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3102 Reviewed By: pemazare Differential Revision: D50426528 Pulled By: algoriddle fbshipit-source-id: 886960b8b522318967fc5ec305666871b496cae8	2023-10-20 07:53:56 -07:00
Gergely Szilvasy	0a00d8137a	offline index evaluation (#3097 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3097 A framework for evaluating indices offline. Long term objectives: 1. Generate offline similarity index performance data with test datasets both for existing indices and automatically generated alternatives. That is, given a dataset and some constraints this workflow should automatically discover optimal index types and parameter choices as well as evaluate the performance of existing production indices and their parameters. 2. Allow researchers, platform owners (Laser, Unicorn) and product teams to understand how different index types perform on their datasets and make optimal choices wrt their objectives. Longer term to enable automatic decision-making/auto-tuning. Constraints, design choices: 1. I want to run the same evaluation on Meta-internal (fblearner, data from hive and manifold) or the local machine + research cluster (data on local disk or NFS) via OSS Faiss. Via fblearner, I want this to work in a way that it can be turned into a service and plugged into Unicorn or Laser, while the core Faiss part can be used/referred to in our research and to update the wiki with the latest results/recommendations for public datasets. 2. It must support a range of metrics for KNN and range search, and it should be easy to add new ones. Cost metrics need to be fine-grained to allow extrapolation. 3. It should automatically sweep all query time params (eg. nprobe, polysemous code hamming distance, params of quantizers), using`OperatingPointsWithRanges` to cut down the optimal param search space. (For now, it sweeps nprobes only.) 4. [FUTURE] It will generate/sweep index creation hyperparams (factory strings, quantizer sizes, quantizer params), using heuristics. 5. [FUTURE] It will sweep the dataset size: start small test with e.g. 100K db vectors and go up to millions, billions potentially, while narrowing down the index+param choices at each step. 6. [FUTURE] Extrapolate perf metrics (cost and accuracy) 7. Intermediate results must be saved (to disk, to manifold) throughout, and reused as much as possible to cut down on overall runtime and enable faster iteration during development. For range search, this diff supports the metric proposed in https://docs.google.com/document/d/1v5OOj7kfsKJ16xzaEHuKQj12Lrb-HlWLa_T2ct0LJiw/edit?usp=sharing I also added support for the classical case where the scoring function steps from 1 to 0 at some arbitrary threshold. For KNN, I added knn_intersection, but other metrics, particularly recall@1 will also be interesting. I also added the distance_ratio metric, which we previously discussed as an interesting alternative, since it shows how much the returned results approximate the ground-truth nearest-neighbours in terms of distances. In the test case, I evaluated three current production indices for VCE with 1M vectors in the database and 10K queries. Each index is tested at various operating points (nprobes), which are shows on the charts. The results are not extrapolated to the true scale of these indices. Reviewed By: yonglimeta Differential Revision: D49958434 fbshipit-source-id: f7f567b299118003955dc9e2d9c5b971e0940fc5	2023-10-17 13:56:02 -07:00
chasingegg	6218111233	Fix some typos (#3056 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3056 Reviewed By: pemazare Differential Revision: D49617607 Pulled By: mlomeli1 fbshipit-source-id: b2d5df67e88e029882e697597af9f3fc8fe1e64c	2023-09-27 03:17:41 -07:00
generatedunixname89002005287564	d85601d972	fairring, faiss, fairness (4401366386162573988) Reviewed By: r-barnes Differential Revision: D49181434 fbshipit-source-id: 0554ec62155b422e4abe9cec709b69587f71dea0	2023-09-14 00:50:50 -07:00
Sid Jha	d48e777412	Fix import (#2936 ) Summary: Previous import does not exist. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2936 Reviewed By: mlomeli1 Differential Revision: D47221019 Pulled By: mdouze fbshipit-source-id: 9ceeba229a10dd4b66da3483cc7695b198e1a8d8	2023-07-05 06:59:05 -07:00
Matthijs Douze	a91a2887fe	use dispatcher function to call HammingComputer (#2918 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2918 The HammingComputer class is optimized for several vector sizes. So far it's been the caller's responsiblity to instanciate the relevant optimized version. This diff introduces a `dispatch_HammingComputer` function that can be called with a template class that is instanciated for all existing optimized HammingComputer's. Reviewed By: algoriddle Differential Revision: D46858553 fbshipit-source-id: 32c31689bba7c0b406b309fc8574c95fa24022ba	2023-06-26 14:06:10 -07:00
Matthijs Douze	a27036aa72	add small benchmark for hamming computers Summary: to measure impact of hamming computer diff Reviewed By: algoriddle Differential Revision: D46913890 fbshipit-source-id: 7b9850205885b9b7c5f394f17a79ba222e7b1e2e	2023-06-26 14:06:10 -07:00
Alexandr Guzhva	d407d3fd03	Improve GenHammingDistance for AVX2 (#2815 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2815 Reviewed By: mdouze Differential Revision: D44817810 fbshipit-source-id: 3d392a6a87ef0192b9ae06fc934fe980596d96a7	2023-04-18 12:56:58 -07:00
Alexandr Guzhva	8d82d24b89	Hamming distance refactoring & ARM version (#2782 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2782 Add a separate branch for ARM Hamming Distance computations. Also, improves a benchmark for hamming computer. Reviewed By: mdouze Differential Revision: D44397463 fbshipit-source-id: 1e44e8e7dd1c5b92e95e8afc754170b501d0feed	2023-03-28 13:44:48 -07:00
Matthijs Douze	c78b18cdb2	remove setNumProbes (#2797 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2797 This is the last code instance of setNumProbes Removing because some people still seem run into errors due to this. Reviewed By: algoriddle Differential Revision: D44421600 fbshipit-source-id: fbc1a9d49a0175ddf24c32dab5c1bdb5f1bbbac6	2023-03-28 07:02:53 -07:00
Matthijs Douze	a80c96c0de	Evaluation script for hybrid CPU / GPU search Summary: Implementation of various combinations of coarse quantization / scaning code on CPU and GPU. Used to generate the results of https://github.com/facebookresearch/faiss/wiki/Hybrid-CPU-GPU-search-and-multiple-GPUs Reviewed By: alexanderguzhva Differential Revision: D43041802 fbshipit-source-id: 12608812ab351d60d4a6dc45be1ca493f76d4375	2023-02-15 12:55:06 -08:00
Alexandr Guzhva	868e17f294	OSS legal requirements (#2698 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2698 Add headers about copyright. Reviewed By: algoriddle Differential Revision: D43085637 fbshipit-source-id: 5a57876b7047097ffe01cd79322674625d9bca34	2023-02-07 14:32:56 -08:00
Matthijs Douze	8fc3775472	building blocks for hybrid CPU / GPU search (#2638 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2638 This diff is a more streamlined way of searching IVF indexes with precomputed clusters. This will be used for experiments with hybrid CPU / GPU search. Reviewed By: algoriddle Differential Revision: D41301032 fbshipit-source-id: a1d645fd0f2bf806454dfd04971edc0a6200d20d	2023-01-12 13:34:44 -08:00
chasingegg	adc9d1a0cd	Refactor prepare cache code in cmp_with_scann benchmark (#2573 ) Summary: In ```cmp_with_scann.py```, we will save npy file for base and query vector file and gt file. However, we will only do this while the lib is faiss, if we directly run this script with scann lib it will complain that file does not exsit. Therefore, the code should be refactored to save npy file from the beginning so that nothing will go wrong. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2573 Reviewed By: mdouze Differential Revision: D42338435 Pulled By: algoriddle fbshipit-source-id: 9227f95e1ff79f5329f6206a0cb7ca169185fdb3	2023-01-04 02:35:18 -08:00
zh Wang	60c850e296	Fix hnsw benchmark (#2591 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2591 Reviewed By: mdouze Differential Revision: D42337854 Pulled By: algoriddle fbshipit-source-id: 222885fe0e1562deddd0f37c0dbedd1963c885e5	2023-01-04 01:49:09 -08:00
Matthijs Douze	fa53e2c941	Implementation of big-batch IVF search (single machine) (#2567 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2567 Intuitively, it should be easier to handle big-batch searches because all distance computations for a set of queries can be done locally within each inverted list. This benchmark implements this in pure python (but should be close to optimal in terms of speed), on CPU for IndexIVFFlat, IndexIVFPQ and IndexIVFScalarQuantizer. GPU is also supported. The results are not systematically better, see https://docs.google.com/document/d/1d3YuV8uN7hut6aOATCOMx8Ut-QEl_oRnJdPgDBRF1QA/edit?usp=sharing Reviewed By: algoriddle Differential Revision: D41098338 fbshipit-source-id: 479e471b0d541f242d420f581775d57b708a61b8	2022-12-09 08:53:13 -08:00
Matthijs Douze	9f13e43486	Building blocks for big batch IVF search Summary: Adds: - a sparse update function to the heaps - bucket sort functions - an IndexRandom index to serve as a dummy coarse quantizer for testing Reviewed By: algoriddle Differential Revision: D41804055 fbshipit-source-id: 9402b31c37c367aa8554271d8c88bc93cc1e2bda	2022-12-08 09:34:16 -08:00
Matthijs Douze	a996a4a052	Put idx_t in the faiss namespace (#2582 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2582 A few more or less cosmetic improvements * Index::idx_t was in the Index object, which does not make much sense, this diff moves it to faiss::idx_t * replace multiprocessing.dummy with multiprocessing.pool * add Alexandr as a core contributor of Faiss in the README ;-) ``` for i in $( find . -name \.cu -o -name \.cuh -o -name \.h -o -name \.cpp ) ; do sed -i s/Index::idx_t/idx_t/ $i done ``` For the fbcode deps: ``` for i in $( fbgs Index::idx_t --exclude fbcode/faiss -l ) ; do sed -i s/Index::idx_t/idx_t/ $i done ``` Reviewed By: algoriddle Differential Revision: D41437507 fbshipit-source-id: 8300f2a3ae97cace6172f3f14a9be3a83999fb89	2022-11-30 08:25:30 -08:00
Alexandr Guzhva	0a622d2d78	Update docs for benchmarks in benchs/ directory (#2565 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2565 Reviewed By: mdouze Differential Revision: D40856253 fbshipit-source-id: 78f549bb37cdb3e6f562d877f5e33fa1c20834dc	2022-11-08 08:44:42 -08:00
Alexandr Guzhva	771b1a8e37	Introduce transposed centroid table to speedup ProductQuantizer::compute_codes() (#2562 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2562 Introduce a table of transposed centroids in ProductQuantizer that significantly speeds up ProductQuantizer::compute_codes() call for certain PQ parameters, so speeds up search queries. * ::sync_tranposed_centroids() call is used to fill the table * ::clear_transposed_centroids() call clear the table, so that the original baseline code is used for ::compute_codes() Reviewed By: mdouze Differential Revision: D40763338 fbshipit-source-id: 87b40e5dd2f8c3cadeb94c1cd9e8a4a5b6ffa97d	2022-11-06 08:32:54 -08:00
Alexandr Guzhva	e11a3cf292	Benchmark for SADecodeKernels (#2554 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2554 Reviewed By: mdouze Differential Revision: D40484008 fbshipit-source-id: c5f9b3c1a42a4ff4ff565d7c3f96af58c967b599	2022-10-31 14:55:08 -07:00
Matthijs Douze	dd814b5f14	IVF filtering based on IDSelector (no init split) (#2483 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2483 This diff changes the following: 1. all search functions now take a `SearchParameters` argument that overrides the internal search parameters 2. the default implementation for most classes throws when the params argument is non-nullptr / non-None 3. the IndexIVF and IndexHNSW classes have functioning SearchPArameters 4. the SearchParameters includes an IDSelector that can search only in a subset of the index based on a defined subset of ids There is also some refactoring: the IDSelector was moved to its own .h/.cpp and python/__init__.py is spit in parts. The diff is quite bulky because the search function prototypes need to be changed in all index classes. Things to fix in subsequent diffs: - support SearchParameters for more index types (Flat variants) - better sub-object ownership for SearchParams (with std::unique_ptr?) - special handling of IDSelectorRange to make it faster Reviewed By: alexanderguzhva Differential Revision: D39852589 fbshipit-source-id: 4988bdb5b9bee1207cd327d3f80bf5e0e2467fe1	2022-09-30 06:40:03 -07:00
alemagnani	230a97f7cb	Support for parallelization in IVFFastScan over both queries and probes (#2380 ) Summary: For search request with few queries or single query, this PR adds the ability to run threads over both queries and different cluster of the IVF. For application where latency is important this can dramatically reduce latency for single query requests. A new implementation (https://github.com/facebookresearch/faiss/issues/14) is added. The new implementation could be merged to the implementation 12 but for simplicity in this PR, I created a separate function. Tests are added to cover the new implementation and new tests are added to specifically cover the case when a single query is used. In my benchmarks a very good reduction of latency is observed for single query requests. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2380 Test Plan: ``` buck test //faiss/tests/:test_fast_scan_ivf -- implem14 buck test //faiss/tests/:test_fast_scan_ivf -- implem15 ``` Reviewed By: alexanderguzhva Differential Revision: D38074577 Pulled By: mdouze fbshipit-source-id: e7a20b6ea2f9216e0a045764b5d7b7f550ea89fe	2022-08-31 05:37:53 -07:00
Ryan Russell	d2806286d2	docs: Improve readability (#2378 ) Summary: Signed-off-by: Ryan Russell <git@ryanrussell.org> Various readability fixes focused on `.md` files: - Grammar - Fix some incorrect command references to `distributed_kmeans.py` - Styling the markdown bash code snippets sections so they format Attempted to put a lot of little things into one PR and commit; let me know if any mods are needed! Best, Ryan Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2378 Reviewed By: alexanderguzhva Differential Revision: D37717671 Pulled By: mdouze fbshipit-source-id: 0039192901d98a083cd992e37f6b692d0572103a	2022-07-08 09:19:07 -07:00
Patrick Somaru	578fbc9a8e	faiss 6bit benchmark config (#2329 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2329 Reviewed By: beauby Differential Revision: D36003967 fbshipit-source-id: 1167d028477ab6f42fe8d3cfd2f198c274c0fe9a	2022-05-17 05:19:54 -07:00
Check Deng	9b1982262a	Add ProductAdditiveQuantizer (#2286 ) Summary: This diff added ProductAdditiveQuantizer. A Simple Algo description: 1. Divide the vector space into several orthogonal sub-spaces, just like PQ does. 2. Quantize each sub-space by an independent additive quantizer. Usage: Construct a ProductAdditiveQuantizer object: - `d`: dimensionality of the input vectors - `nsplits`: number of sub-spaces divided into - `Msub`: `M` of each additive quantizer - `nbits`: `nbits` of each additive quantizer ```python d = 128 nsplits = 2 Msub = 4 nbits = 8 plsq = faiss.ProductLocalSearchQuantizer(d, nsplits, Msub, nbits) prq = faiss.ProductResidualQuantizer(d, nsplits, Msub, nbits) ``` Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2286 Test Plan: ``` buck test //faiss/tests/:test_local_search_quantizer -- TestProductLocalSearchQuantizer buck test //faiss/tests/:test_residual_quantizer -- TestProductResidualQuantizer ``` Reviewed By: alexanderguzhva Differential Revision: D35907702 Pulled By: mdouze fbshipit-source-id: 7428a196e6bd323569caa585c57281dd70e547b1	2022-05-05 15:14:07 -07:00
Lucas Hosseini	7c9d979d66	Enable servicelab regression testing. Summary: Start migration of existing benchmarks to Google's Benchmark library + register benchmark to servicelab. The benchmark should be automatically registered to servicelab once this diff lands according to https://www.internalfb.com/intern/wiki/ServiceLab/Use_Cases/Benchmarks_(C++)/#servicelab-job. Reviewed By: mdouze Differential Revision: D35397782 fbshipit-source-id: 317db2527f12ddde0631cacc3085c634afdd0e37	2022-04-07 02:45:55 -07:00
Matthijs Douze	b8fe92dfee	contrib clustering module (#2217 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2217 This diff introduces a new Faiss contrib module that contains: - generic k-means implemented in python (was in distributed_ondisk) - the two-level clustering code, including a simple function that runs it on a Faiss IVF index. - sparse clustering code (new) The main idea is that that code is often re-used so better have it in contrib. Reviewed By: beauby Differential Revision: D34170932 fbshipit-source-id: cc297cc56d241b5ef421500ed410d8e2be0f1b77	2022-02-28 14:18:47 -08:00
Check Deng	41007232d6	AQ fastscan (#2169 ) Summary: Work in progress. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2169 Test Plan: buck test mode/opt //faiss/tests/:test_fast_scan buck test mode/opt //faiss/tests/:test_fast_scan_ivf Reviewed By: beauby Differential Revision: D34208813 Pulled By: mdouze fbshipit-source-id: 74b72e07dc537667a7def403c4e46d3d05408c27	2022-02-22 15:24:31 -08:00
Chengqi Deng	eba1cb1a90	Support LSQ on GPU (#1978 ) Summary: ## Description This PR added support for LSQ on GPU. Only the encoding part is running on GPU and the others are still running on CPU. Multi-GPU is also supported. ## Usage ``` python lsq = faiss.LocalSearchQuantizer(d, M, nbits) ngpus = faiss.get_num_gpus() lsq.icm_encoder_factory = faiss.GpuIcmEncoderFactory(ngpus) # we use all gpus lsq.train(xt) codes = lsq.compute_codes(xb) decoded = lsq.decode(codes) ``` ## Performance on SIFT1M On 1 GPU: ``` ===== lsq-gpu: mean square error = 17337.878528 training time: 40.9857234954834 s encoding time: 27.12640070915222 s ``` On 2 GPUs: ``` ===== lsq-gpu: mean square error = 17364.658176 training time: 25.832106113433838 s encoding time: 14.879548072814941 s ``` On CPU: ``` ===== lsq: mean square error = 17305.880576 training time: 152.57522344589233 s encoding time: 110.01779270172119 s ``` Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1978 Test Plan: buck test mode/dev-nosan //faiss/gpu/test/:test_gpu_index_py -- TestLSQIcmEncoder Reviewed By: wickedfoo Differential Revision: D29609763 Pulled By: mdouze fbshipit-source-id: b6ffa2a3c02bf696a4e52348132affa0dd838870	2021-09-09 09:13:15 -07:00
Matthijs Douze	760cce7f3a	Support for additive quantizer search (#1961 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1961 This diff implements LUT-based search for additive quantizers. It also further merges code for LSQ and the RedisualQuantizer. The documentation + evaluation is on github: https://github.com/facebookresearch/faiss/wiki/Additive-quantizers Reviewed By: wickedfoo Differential Revision: D29395079 fbshipit-source-id: b8a24a647bbdc4cda2a699e791ffdb2a12bfa9c6	2021-08-20 01:00:10 -07:00
Check Deng	48ae55348a	Update codebooks with double type (#1975 ) Summary: ## Description The process of updating the codebook in LSQ may be unstable if the data is not zero-centering. This diff fixed it by using `double` instead of `float` during codebook updating. This would not affect the performance since the update process is quite fast. Users could switch back to `float` mode by setting `update_codebooks_with_double = False` ## Changes 1. Support `double` during codebook updating. 2. Add a unit test. 3. Add `__init__.py` under `contrib/` to avoid warnings. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1975 Reviewed By: wickedfoo Differential Revision: D29565632 Pulled By: mdouze fbshipit-source-id: 932d7932ae9725c299cd83f87495542703ad6654	2021-07-07 03:29:49 -07:00
Chengqi Deng	c087f87730	Add LocalSearchQuantizer (#1906 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1906 This PR implemented LSQ/LSQ++, a vector quantization technique described in the following two papers: 1. Revisiting additive quantization 2. LSQ++: Lower running time and higher recall in multi-codebook quantization Here is a benchmark running on SIFT1M for 64 bits encoding: ``` ===== lsq: mean square error = 17335.390208 training time: 312.729779958725 s encoding time: 244.6277096271515 s ===== pq: mean square error = 23743.004672 training time: 1.1610801219940186 s encoding time: 2.636141061782837 s ===== rq: mean square error = 20999.737344 training time: 31.813055515289307 s encoding time: 307.51959800720215 s ``` Changes: 1. Add LocalSearchQuantizer object 2. Fix an out of memory bug in ResidualQuantizer 3. Add a benchmark for evaluating quantizers 4. Add tests for LocalSearchQuantizer Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1862 Test Plan: ``` buck test //faiss/tests/:test_lsq buck run mode/opt //faiss/benchs/:bench_quantizer -- lsq pq rq ``` Reviewed By: beauby Differential Revision: D28376369 Pulled By: mdouze fbshipit-source-id: 2a394d38bf75b9de0a1c2cd6faddf7dd362a6fa8	2021-05-21 01:33:55 -07:00
Chengqi Deng	6f6e90162b	Fix typo in bench_index_flat (#1810 ) Summary: This PR fixed the typo in `bench_index_flat.py`. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1810 Reviewed By: beauby Differential Revision: D27706115 Pulled By: mdouze fbshipit-source-id: 35515450be8eb45d6a2e98c7372333d98fc0f7b4	2021-04-15 22:58:42 -07:00
Check Deng	b35103a138	Add NSG (#1707 ) Summary: ## Description: This diff implemented Navigating Spreading-out Graph (NSG) which accepts a KNN graph as input. Here is the interface of building an NSG graph: ``` c++ void IndexNSG::build(idx_t n, const float x, idx_t knn_graph, int GK); ``` where `GK` is the nb of neighbors per node and `knn_graph[i * GK + j]` is the j-th neighbor of node i. The `add` method is not implemented yet. The unit tests could be found in `tests/test_nsg.cpp`. mdouze beauby Maybe I need some advice on how to design the interface and support python. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1707 Test Plan: buck test //faiss/tests/:test_index -- TestNSG Reviewed By: beauby Differential Revision: D26748498 Pulled By: mdouze fbshipit-source-id: 3280f705fb1b5f9c8cc5efeba63b904c3b832544	2021-03-10 15:03:00 -08:00
Lucas Hosseini	e86bf8cae1	Enable clang-format + autofix. Summary: Format whole codebase with clang-format. Reviewed By: mdouze Differential Revision: D22891341 fbshipit-source-id: 673032b2444d61026d1e2c3fa2c5659f178cf58b	2021-02-25 04:46:10 -08:00
Lucas Hosseini	6d51766607	Fix unused variables in python Reviewed By: mdouze Differential Revision: D26633983 fbshipit-source-id: 32b9f95ed9647716f65b93f2713a8d5bad6abe78	2021-02-24 11:52:18 -08:00
Lucas Hosseini	2a01135127	Add missing copyright headers. (#1689 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1689 Reviewed By: mdouze Differential Revision: D26460606 Pulled By: beauby fbshipit-source-id: ad35dd2ea3fb23a0b87bc04597a8fbc38393c997	2021-02-16 09:11:30 -08:00
shengjun.li	cf33102a7e	Improve performance of Hamming computer (#1661 ) Summary: Signed-off-by: shengjun.li <shengjun.li@zilliz.com> Improve performance of Hamming computer Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1661 Reviewed By: wickedfoo Differential Revision: D26222892 Pulled By: mdouze fbshipit-source-id: 5c1228b9e6c0f196ebcdfb0227ecdf7a02610871	2021-02-03 10:32:24 -08:00
shengjun.li	908812266c	Add heap_replace_top to simplify heap_pop + heap_push (#1597 ) Summary: Signed-off-by: shengjun.li <shengjun.li@zilliz.com> Add heap_replace_top to simplify heap_pop + heap_push Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1597 Test Plan: OMP_NUM_THREADS=1 buck run mode/opt //faiss/benchs/:bench_heap_replace OMP_NUM_THREADS=8 buck run mode/opt //faiss/benchs/:bench_heap_replace Reviewed By: beauby Differential Revision: D25943140 Pulled By: mdouze fbshipit-source-id: 66fe67779dd281a7753f597542c2e797ba0d7df5	2021-01-20 11:28:08 -08:00

1 2 3

104 Commits