faiss

mirror of https://github.com/facebookresearch/faiss.git synced 2025-06-03 21:54:02 +08:00

Author	SHA1	Message	Date
Check Deng	838f85cb52	Implement search methods for ProductAdditiveQuantizer (#2336 ) Summary: Work in progress. This PR is going to implement the following search methods for ProductAdditiveQuantizer, including index factory and I/O: - [x] IndexProductAdditiveQuantizer - [x] IndexIVFProductAdditiveQuantizer - [x] IndexProductAdditiveQuantizerFastScan - [x] IndexIVFProductAdditiveQuantizerFastScan Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2336 Test Plan: buck test //faiss/tests/:test_fast_scan buck test //faiss/tests/:test_fast_scan_ivf buck test //faiss/tests/:test_local_search_quantizer buck test //faiss/tests/:test_residual_quantizer Reviewed By: alexanderguzhva Differential Revision: D37172745 Pulled By: mdouze fbshipit-source-id: 6ff18bfc462525478c90cd42e21805ab8605bd0f	2022-07-27 05:32:15 -07:00
Matthijs Douze	f2a9324359	make tests cheaper Summary: Many of the additive quantizer tests are recognized as flaky because the tests timeout in non-optimized stress mode. This is probably because they don't import https://www.internalfb.com/code/fbsource/fbcode/faiss/tests/common_faiss_tests.py that sets the number of threads to 4. This diff fixes that and in addition declares the tests as "heavyweight" so that not too many of them are spawned in parallel in stress mode. https://www.internalfb.com/intern/wiki/TAE/tpx/Timeouts_and_Sharded_Bundled_mode/#degree-of-parallelism Hopefully it should fix the flaky tests Reviewed By: alexanderguzhva Differential Revision: D38111820 fbshipit-source-id: 7dd7c72e7e92b82384a170743cfd5c4aaf9a6960	2022-07-25 06:58:39 -07:00
Check Deng	9b1982262a	Add ProductAdditiveQuantizer (#2286 ) Summary: This diff added ProductAdditiveQuantizer. A Simple Algo description: 1. Divide the vector space into several orthogonal sub-spaces, just like PQ does. 2. Quantize each sub-space by an independent additive quantizer. Usage: Construct a ProductAdditiveQuantizer object: - `d`: dimensionality of the input vectors - `nsplits`: number of sub-spaces divided into - `Msub`: `M` of each additive quantizer - `nbits`: `nbits` of each additive quantizer ```python d = 128 nsplits = 2 Msub = 4 nbits = 8 plsq = faiss.ProductLocalSearchQuantizer(d, nsplits, Msub, nbits) prq = faiss.ProductResidualQuantizer(d, nsplits, Msub, nbits) ``` Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2286 Test Plan: ``` buck test //faiss/tests/:test_local_search_quantizer -- TestProductLocalSearchQuantizer buck test //faiss/tests/:test_residual_quantizer -- TestProductResidualQuantizer ``` Reviewed By: alexanderguzhva Differential Revision: D35907702 Pulled By: mdouze fbshipit-source-id: 7428a196e6bd323569caa585c57281dd70e547b1	2022-05-05 15:14:07 -07:00
Matthijs Douze	291353c5a9	Generalize DistanceComputer for flat indexes (#2255 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2255 The `DistanceComputer` object is derived from an Index (obtained with `get_distance_computer()`). It maintains a current query and quickly computes distances from that query to any item in the database. This is useful, eg. for the IndexHNSW and IndexNSG that rely on query-to-point comparisons in the datasets. This diff introduces the `FlatCodesDistanceComputer`, that inherits from `DistanceComputer` for Flat indexes. In addition to the distance-to-item function, it adds a `distance_to_code` that computes the distance from any code to the current query, even if it is not stored in the index. This is implemented for all FlatCode indexes (IndexFlat, IndexPQ, IndexScalarQuantizer and IndexAdditiveQuantizer). In the process, the two classes were extracted to their own header file `impl/DistanceComputer.h` Reviewed By: beauby Differential Revision: D34863609 fbshipit-source-id: 39d8c66475e55c3223c4a6a210827aa48bca292d	2022-03-20 23:43:33 -07:00
Ivan Sopin	d50211a38f	Break distance ties in `heap_replace_top()` by ID (#2245 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2245 This changeset makes the `heap_replace_top()` function of the FAISS heap implementation break distance ties by the element's ID, according to the heap's min/max property. Reviewed By: mdouze Differential Revision: D34669542 fbshipit-source-id: 0db24fd12442eedeee917fbb3e811ba4a070ce0f	2022-03-09 10:23:48 -08:00
Matthijs Douze	07a874d5b1	Post-training refinement of residual quantizer codebooks (#2166 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2166 RQ training is done progressively from one quantizer to the next, maintaining a current set of codes and quantization centroids. However, for RQ as for any additive quantizer, there is a closed form solution for the centroids that minimizes the quantization error for fixed codes. This diff offers the option to estimate that codebook at the end of the optimization. It performs this estimation iteratively, ie. several rounds of code computation - codebook refinement are performed. A pure python implementation + results is here: https://github.com/fairinternal/faiss_improvements/blob/dbcc746/decoder/refine_aq_codebook.ipynb Reviewed By: wickedfoo Differential Revision: D33309409 fbshipit-source-id: 55c13425292e73a1b05f00e90f4dcfdc8b3549e8	2022-01-05 00:59:16 -08:00
Chengqi Deng	26abede812	Non-uniform quantization of vector norms (#2037 ) Summary: This diff implemented non-uniform quantization of vector norms in additive quantizers. index_factory and I/O are supported. index_factory: `XXX_Ncqint{nbits}` where `nbits` is the number of bits to quantize vector norm. For 8 bits code, it is almost the same as 8-bit uniform quantization. It will slightly improve the accuracy if the code size is less than 8 bits. ``` RQ4x8_Nqint8: R@1 0.1116 RQ4x8_Ncqint8: R@1 0.1117 RQ4x8_Nqint4: R@1 0.0901 RQ4x8_Ncqint4: R@1 0.0989 ``` Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2037 Test Plan: buck test //faiss/tests/:test_clustering -- TestClustering1D buck test //faiss/tests/:test_lsq -- test_index_accuracy_cqint buck test //faiss/tests/:test_residual_quantizer -- test_norm_cqint buck test //faiss/tests/:test_residual_quantizer -- test_search_L2 Reviewed By: beauby Differential Revision: D31083476 Pulled By: mdouze fbshipit-source-id: f34c3dafc4eb1c6f44a63e68137158911aa4a2f4	2021-10-11 14:13:16 -07:00
Matthijs Douze	151e3d7be5	fix centroids_norms storage for ResidualCoarseQuantizer (#2018 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2018 The centroids norms table was not reconstructed correctly after being stored in RCQ. Reviewed By: Sugoshnr Differential Revision: D30484389 fbshipit-source-id: 9f618a3939c99dc987590c07eda8e76e19248b08	2021-08-25 06:37:33 -07:00
Matthijs Douze	760cce7f3a	Support for additive quantizer search (#1961 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1961 This diff implements LUT-based search for additive quantizers. It also further merges code for LSQ and the RedisualQuantizer. The documentation + evaluation is on github: https://github.com/facebookresearch/faiss/wiki/Additive-quantizers Reviewed By: wickedfoo Differential Revision: D29395079 fbshipit-source-id: b8a24a647bbdc4cda2a699e791ffdb2a12bfa9c6	2021-08-20 01:00:10 -07:00
Matthijs Douze	8eab15eca3	LUT based search for additive quantizers (#1908 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1908 To search the best combination of codebooks, the method that was implemented so far is via a beam search. It is possible to make this faster for a query vector q by precomputing look-up tables in the form of LUT_m = <q, cent_m> where cent_m is the set of centroids for quantizer m=0..M-1. The LUT can then be used as inner_prod = sum_m LUT_m[c_m] and L2_distance = norm_q + norm_db - 2 * inner_prod This diff implements this computation by: - adding the LUT precomputation - storing an exhaustive table of all centroid norms (when using L2) This is only practical for small additive quantizers, eg. when a residual vector quantizer is used as coarse quantizer (ResidualCoarseQuantizer). This diff is based on AdditiveQuantizer diff because it applies equally to other quantizers (eg. the LSQ). Reviewed By: sc268 Differential Revision: D28467746 fbshipit-source-id: 82611fe1e4908c290204d4de866338c622ae4148	2021-05-25 01:54:53 -07:00
Matthijs Douze	441ccebbff	Make more Residual quantizer more memory efficient (#1865 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1865 This diff chunks vectors to encode to make it more memory efficient. Reviewed By: sc268 Differential Revision: D28234424 fbshipit-source-id: c1afd2aaff953d4ecd339800d5951ae1cae4789a	2021-05-07 02:12:27 -07:00
Matthijs Douze	bb3c52a057	IndexResidual codec Summary: This diff adds the following to bring the residual quantizer support on-par with PQ: - IndexResidual can be built with index factory, serialized and used as a Faiss codec. - ResidualCoarseQuantizer can be used as a coarse quantizer for inverted files. The factory string looks like "RQ1x16_6x8" which means a first 16-bit quantizer then 6 8-bit ones. For IVF it's "IVF4096(RQ2x6),Flat". Reviewed By: sc268 Differential Revision: D27865612 fbshipit-source-id: f9f11d29e9f89d3b6d4cd22e9a4f9222422d5f26	2021-04-26 20:26:43 -07:00
Matthijs Douze	7559cf5c5b	add ResidualQuantizer Summary: This diff includes: - progressive dimension k-means. - the ResidualQuantizer object - GpuProgressiveDimIndexFactory so that it can be trained on GPU - corresponding tests - reference Python implementation of the same in scripts/matthijs/LCC_encoding Reviewed By: wickedfoo Differential Revision: D27608029 fbshipit-source-id: 9a8cf3310c8439a93641961ca8b042941f0f4249	2021-04-14 13:11:54 -07:00

13 Commits