Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2132
This diff adds the class IndexFlatCodes that becomes the parent of all "flat" encodings.
IndexPQ
IndexFlat
IndexAdditiveQuantizer
IndexScalarQuantizer
IndexLSH
Index2Layer
The other changes are:
- for IndexFlat, there is no vector<float> with the data anymore. It is replaced with a `get_xb()` function. This broke quite a few external codes, that this diff also attempts to fix.
- I/O functions needed to be adapted. This is done without changing the I/O format for any index.
- added a small contrib function to get the data from the IndexFlat
- the functionality has been made uniform, for example remove_ids and add are now in the parent class.
Eventually, we may support generic storage for flat indexes, similar to `InvertedLists`, eg to memmap the data, but this will again require a big change.
Reviewed By: wickedfoo
Differential Revision: D32646769
fbshipit-source-id: 04a1659173fd51b130ae45d345176b72183cae40
Summary:
It looks like loading FAISS codecs which have the flag set fails in dev mode since it builds in ASAN and nothing in the code checks for that.
The error is not present in opt mode, however the sandcastle test fails
For best practice, this diff introduces combination of train_type_t flags as enums
Reviewed By: mdouze
Differential Revision: D32746904
fbshipit-source-id: f20820350e0b07b35e04c965dee01b790194e6f3
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2123
One of the encodings used by LCC is based on a RCQ coarse quantizer and a "payload" of ITQ. The codes are compared with Hamming distances.
The index type `IndexIVFSpectralHash` can be re-purposed to perfrorm this type of index.
This diff contains a small demo demo_rcq_itq script in python to show how:
* the RCQ + ITQ are trained
* the RCQ + ITQ index add and search work (with a very inefficient python implementation)
* they can be transferred to an `IndexIVFSpectralHash`
* the python implementation and `IndexIVFSpectralHash` give the same results
The advantage of using to an `IndexIVFSpectralHash` is that in C++ it offers an `InvertedListScanner` object that can be used to compute query to code distances with its `distance_to_code` method. This is generic and will generalize to other types of encodings and coarse quantizers.
What is missing is an index_factory to make instanciation easier.
Reviewed By: sc268
Differential Revision: D32642900
fbshipit-source-id: 284f3029d239b7946bbca44a748def4e058489bd
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2117
This supports 2 concatenated codecs. It is based on IndexRefine, that already does this but did not have a standalone codec interface.
The main use case for now is a residual quantizer + ITQ.
The test below demonstrates how to instantiate that.
The advantage is that the index_factory parser already exists.
The IndexRefine decoder just uses the second index decoder, that is supposed to be more accurate than the first.
Reviewed By: beauby
Differential Revision: D32569997
fbshipit-source-id: 3fe9cd02eaa7d1cfe23b0f1168cc034821f1c362
Summary:
The results returned by `NSG::search` are already sorted. Calling `maxheap_reorder` will make the results unorder.
Fixed https://github.com/facebookresearch/faiss/issues/2081.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2086
Test Plan: buck test //faiss/tests/:test_index -- test_order
Reviewed By: beauby
Differential Revision: D32593924
Pulled By: mdouze
fbshipit-source-id: 794b94681610657bd2f305f7e3d6cd5d25c6bdba
Summary:
PCA whitening implies to multiply eigenvectors with 1/sqrt(singular values of convariance matrix). The singular values are sometimes 0 (because the vector subspace is not full-rank) or negative (because of numerical issues).
Therefore, this diff adds an epsilon to the denominator above (default 0).
Reviewed By: edpizzi
Differential Revision: D31725075
fbshipit-source-id: dae68bda9f7452220785d76e30ce4b2ac8582413
Summary:
This diff implemented non-uniform quantization of vector norms in additive quantizers. index_factory and I/O are supported.
index_factory: `XXX_Ncqint{nbits}` where `nbits` is the number of bits to quantize vector norm.
For 8 bits code, it is almost the same as 8-bit uniform quantization. It will slightly improve the accuracy if the code size is less than 8 bits.
```
RQ4x8_Nqint8: R@1 0.1116
RQ4x8_Ncqint8: R@1 0.1117
RQ4x8_Nqint4: R@1 0.0901
RQ4x8_Ncqint4: R@1 0.0989
```
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2037
Test Plan:
buck test //faiss/tests/:test_clustering -- TestClustering1D
buck test //faiss/tests/:test_lsq -- test_index_accuracy_cqint
buck test //faiss/tests/:test_residual_quantizer -- test_norm_cqint
buck test //faiss/tests/:test_residual_quantizer -- test_search_L2
Reviewed By: beauby
Differential Revision: D31083476
Pulled By: mdouze
fbshipit-source-id: f34c3dafc4eb1c6f44a63e68137158911aa4a2f4
Summary:
Following up on issue https://github.com/facebookresearch/faiss/issues/2054 it seems that this code crashes Faiss (instead of just leaking memory).
Findings:
- when running in MT mode, each search in an indexflat used as coarse quantizer consumes some memory
- this mem consumption does not appear in single-thread mode or with few threads
- in gdb it appears that even when the nb of queries is 1, each search spawns max_threads threads (80 on the test machine)
This diff:
- adds a C++ test that checks how much mem is used when repeatedly searching a vector
- adjusts the number of search threads to the number of query vectors. This is especially useful for single-vector queries.
Reviewed By: beauby
Differential Revision: D31142383
fbshipit-source-id: 134ddaf141e7c52a854cea398f5dbf89951a7ff8
Summary:
I am correcting a minor typo in the error checking.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2055
Reviewed By: beauby, wickedfoo
Differential Revision: D31052509
Pulled By: mdouze
fbshipit-source-id: 36063107503afa5c37a98a0ae0d62df8bc8832e8
Summary:
## Description
This PR added support for LSQ on GPU. Only the encoding part is running on GPU and the others are still running on CPU.
Multi-GPU is also supported.
## Usage
``` python
lsq = faiss.LocalSearchQuantizer(d, M, nbits)
ngpus = faiss.get_num_gpus()
lsq.icm_encoder_factory = faiss.GpuIcmEncoderFactory(ngpus) # we use all gpus
lsq.train(xt)
codes = lsq.compute_codes(xb)
decoded = lsq.decode(codes)
```
## Performance on SIFT1M
On 1 GPU:
```
===== lsq-gpu:
mean square error = 17337.878528
training time: 40.9857234954834 s
encoding time: 27.12640070915222 s
```
On 2 GPUs:
```
===== lsq-gpu:
mean square error = 17364.658176
training time: 25.832106113433838 s
encoding time: 14.879548072814941 s
```
On CPU:
```
===== lsq:
mean square error = 17305.880576
training time: 152.57522344589233 s
encoding time: 110.01779270172119 s
```
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1978
Test Plan: buck test mode/dev-nosan //faiss/gpu/test/:test_gpu_index_py -- TestLSQIcmEncoder
Reviewed By: wickedfoo
Differential Revision: D29609763
Pulled By: mdouze
fbshipit-source-id: b6ffa2a3c02bf696a4e52348132affa0dd838870
Summary:
I want to invoke norm computations by using CGO, but I find some functions which have been implemented in cpp are not exported in c api, so I commit the PR to solve the problem.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2036
Reviewed By: beauby
Differential Revision: D30762172
Pulled By: mdouze
fbshipit-source-id: 097b32f29658c1864bd794734daaef0dd75d17ef
Summary:
This is required for the renaming of the default branch from `master` to `main`, in accordance with the new Facebook OSS guidelines.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2029
Reviewed By: mdouze
Differential Revision: D30672862
Pulled By: beauby
fbshipit-source-id: 0b6458a4ff02a12aae14cf94057e85fdcbcbff96
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2018
The centroids norms table was not reconstructed correctly after being stored in RCQ.
Reviewed By: Sugoshnr
Differential Revision: D30484389
fbshipit-source-id: 9f618a3939c99dc987590c07eda8e76e19248b08
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1983
Add clone_index support to ResidualCoarseQuantizer to enable GPU training. Similar to D28614996
Reviewed By: mdouze
Differential Revision: D29605169
fbshipit-source-id: bf9cc32b60061a42310506058ebb45d5f2cea8d8
Summary:
## Description
The process of updating the codebook in LSQ may be unstable if the data is not zero-centering. This diff fixed it by using `double` instead of `float` during codebook updating. This would not affect the performance since the update process is quite fast.
Users could switch back to `float` mode by setting `update_codebooks_with_double = False`
## Changes
1. Support `double` during codebook updating.
2. Add a unit test.
3. Add `__init__.py` under `contrib/` to avoid warnings.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1975
Reviewed By: wickedfoo
Differential Revision: D29565632
Pulled By: mdouze
fbshipit-source-id: 932d7932ae9725c299cd83f87495542703ad6654
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1972
This fixes a few issues that I ran into + adds tests:
- range_search_max_results with IP search
- a few missing downcasts for VectorTRansforms
- ResultHeap supports max IP search
Reviewed By: wickedfoo
Differential Revision: D29525093
fbshipit-source-id: d4ff0aff1d83af9717ff1aaa2fe3cda7b53019a3
Summary:
Currently CI jobs using conda are failed due to conflict packages.
This PR fixes this.
- use newer `numpy` to build `faiss-cpu`
- install `pytorch` when testing `faiss-cpu`
- to find correct `pytorch` package, `pytorch` channel is set at `conda build`
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1884
Reviewed By: mdouze
Differential Revision: D28777447
Pulled By: beauby
fbshipit-source-id: 82a1ce076abe6bbbba9415e8935ed57b6104b6c3
Summary:
This should fix the GPU nighties.
The rationale for the cp is that there is a shared file between the CPU and GPU tests.
Ideally, this file should probably moved to contrib at some point.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1901
Reviewed By: beauby
Differential Revision: D28680898
Pulled By: mdouze
fbshipit-source-id: b9d0e1969103764ecb6f1e047c9ed4bd4a76aaba
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1908
To search the best combination of codebooks, the method that was implemented so far is via a beam search.
It is possible to make this faster for a query vector q by precomputing look-up tables in the form of
LUT_m = <q, cent_m>
where cent_m is the set of centroids for quantizer m=0..M-1.
The LUT can then be used as
inner_prod = sum_m LUT_m[c_m]
and
L2_distance = norm_q + norm_db - 2 * inner_prod
This diff implements this computation by:
- adding the LUT precomputation
- storing an exhaustive table of all centroid norms (when using L2)
This is only practical for small additive quantizers, eg. when a residual vector quantizer is used as coarse quantizer (ResidualCoarseQuantizer).
This diff is based on AdditiveQuantizer diff because it applies equally to other quantizers (eg. the LSQ).
Reviewed By: sc268
Differential Revision: D28467746
fbshipit-source-id: 82611fe1e4908c290204d4de866338c622ae4148
Summary:
Moving index from cpu to gpu is failing with error message `RuntimeError: Error in virtual faiss::Index *faiss::Cloner::clone_Index(const faiss::Index *) at faiss/clone_index.cpp:144: clone not supported for this type of Index`
This diff support IndexResidual clone and unblocks gpu training
Reviewed By: sc268, mdouze
Differential Revision: D28614996
fbshipit-source-id: 9b1e5e7c5dd5da6d55f02594b062691565a86f49
Summary: This is necessary to share a `SQDistanceComputer` instance among multiple thread, when the codes are not stored in a faiss index. The function is `const` and thread-safe.
Reviewed By: philippv, mdouze
Differential Revision: D28623897
fbshipit-source-id: e527d98231bf690dc01191dcc597ee800b5e57a9
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1906
This PR implemented LSQ/LSQ++, a vector quantization technique described in the following two papers:
1. Revisiting additive quantization
2. LSQ++: Lower running time and higher recall in multi-codebook quantization
Here is a benchmark running on SIFT1M for 64 bits encoding:
```
===== lsq:
mean square error = 17335.390208
training time: 312.729779958725 s
encoding time: 244.6277096271515 s
===== pq:
mean square error = 23743.004672
training time: 1.1610801219940186 s
encoding time: 2.636141061782837 s
===== rq:
mean square error = 20999.737344
training time: 31.813055515289307 s
encoding time: 307.51959800720215 s
```
Changes:
1. Add LocalSearchQuantizer object
2. Fix an out of memory bug in ResidualQuantizer
3. Add a benchmark for evaluating quantizers
4. Add tests for LocalSearchQuantizer
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1862
Test Plan:
```
buck test //faiss/tests/:test_lsq
buck run mode/opt //faiss/benchs/:bench_quantizer -- lsq pq rq
```
Reviewed By: beauby
Differential Revision: D28376369
Pulled By: mdouze
fbshipit-source-id: 2a394d38bf75b9de0a1c2cd6faddf7dd362a6fa8
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1905
This PR added some tests to make sure the building with AVX2 works as we expected in Linux.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1792
Test Plan: buck test //faiss/tests/:test_fast_scan -- test_PQ4_speed
Reviewed By: beauby
Differential Revision: D27435796
Pulled By: mdouze
fbshipit-source-id: 901a1d0abd9cb45ccef541bd7a570eb2bd8aac5b
Summary:
related: https://github.com/facebookresearch/faiss/issues/1815, https://github.com/facebookresearch/faiss/issues/1880
`vshl` / `vshr` of ARM NEON requires immediate (compiletime constant) value as shift parameter.
However, the implementations of those intrinsics on GCC can receive runtime value.
Current faiss implementation depends on this, so some correct-behavioring compilers like Clang can't build faiss for aarch64.
This PR fix this issue; thus faiss applied this PR can be built with Clang for aarch64 machines like M1 Mac.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1882
Reviewed By: beauby
Differential Revision: D28465563
Pulled By: mdouze
fbshipit-source-id: e431dfb3b27c9728072f50b4bf9445a3f4a5ac43
Summary:
Also remove support for deprecated compute capabilities 3.5 and 5.2 in
CUDA 11.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1899
Reviewed By: mdouze
Differential Revision: D28539826
Pulled By: beauby
fbshipit-source-id: 6e8265f2bfd991ff3d14a6a5f76f9087271f3f75
Summary:
Current `faiss` contains some codes which will be warned by compilers when we will add some compile options like `-Wall -Wextra` .
IMHO, warning codes in `.cpp` and `.cu` doesn't need to be fixed if the policy of this project allows warning.
However, I think that it is better to fix the codes in `.h` and `.cuh` , which are also referenced by `faiss` users.
Currently it makes a error to `#include` some faiss headers like `#include<faiss/IndexHNSW.h>` when we compile the codes with `-pedantic-errors` .
This PR fix this problem.
In this PR, for the reasons above, we fixed `-Wpedantic` codes only in `.h` .
This PR doesn't change `faiss` behavior.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1888
Reviewed By: wickedfoo
Differential Revision: D28506963
Pulled By: beauby
fbshipit-source-id: cbdf0506a95890c9c1b829cb89ee60e69cf94a79
Summary:
This diff fixes a serious bug in the range search implementation.
During range search in a flat index, (exhaustive_L2sqr_seq and exhaustive_inner_product_seq) when running in multiple threads, the per-thread results are collected into RangeSearchPartialResult structures.
When the computation is finished, they are aggregated into a RangeSearchResult. In the previous version of the code, this loop was nested into a second loop that is used to check for KeyboardInterrupts. Thus, at each iteration, the results were overwritten.
The fix removes the outer loop. It is most likely useless anyways because the sequential code is called only for a small number of queries, for a larger number the BLAS version is used.
Reviewed By: wickedfoo
Differential Revision: D28486415
fbshipit-source-id: 89a52b17f6ca1ef68fc5e758f0e5a44d0df9fe38
Summary:
In the current `faiss` implementation for x86, `fvec_L2sqr` , `fvec_inner_product` , and `fvec_norm_L2sqr` are [optimized for any dimensionality](e86bf8cae1/faiss/utils/distances_simd.cpp (L404-L432)).
On the other hand, the functions for aarch64 are optimized [**only** if `d` is multiple for 4](e86bf8cae1/faiss/utils/distances_simd.cpp (L583-L584)); thus, they are not much fast for vectors with `d % 4 != 0` .
This PR has accelerated the above three functions for any input size on aarch64.
Kind regards,

- Evaluated on an AWS EC2 ARM instance (c6g.4xlarge)
- sift1m127 is the dataset with dropped trailing elements of sift1m
- Therefore, the vector length of sift1m127 is 127 that is not multiple of 4
- "optimized" runs 2.45-2.77 times faster than "original" with sift1m127
- Two methods, "original" and "optimized", are expected to achieve the same level of performance for sift1m
- And actually there is almost no significant difference
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1878
Reviewed By: beauby
Differential Revision: D28376329
Pulled By: mdouze
fbshipit-source-id: c68f13b4c426e56681d81efd8a27bd7bead819de
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1865
This diff chunks vectors to encode to make it more memory efficient.
Reviewed By: sc268
Differential Revision: D28234424
fbshipit-source-id: c1afd2aaff953d4ecd339800d5951ae1cae4789a