Summary:
This should fix the GPU nighties.
The rationale for the cp is that there is a shared file between the CPU and GPU tests.
Ideally, this file should probably moved to contrib at some point.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1901
Reviewed By: beauby
Differential Revision: D28680898
Pulled By: mdouze
fbshipit-source-id: b9d0e1969103764ecb6f1e047c9ed4bd4a76aaba
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1908
To search the best combination of codebooks, the method that was implemented so far is via a beam search.
It is possible to make this faster for a query vector q by precomputing look-up tables in the form of
LUT_m = <q, cent_m>
where cent_m is the set of centroids for quantizer m=0..M-1.
The LUT can then be used as
inner_prod = sum_m LUT_m[c_m]
and
L2_distance = norm_q + norm_db - 2 * inner_prod
This diff implements this computation by:
- adding the LUT precomputation
- storing an exhaustive table of all centroid norms (when using L2)
This is only practical for small additive quantizers, eg. when a residual vector quantizer is used as coarse quantizer (ResidualCoarseQuantizer).
This diff is based on AdditiveQuantizer diff because it applies equally to other quantizers (eg. the LSQ).
Reviewed By: sc268
Differential Revision: D28467746
fbshipit-source-id: 82611fe1e4908c290204d4de866338c622ae4148
Summary:
Moving index from cpu to gpu is failing with error message `RuntimeError: Error in virtual faiss::Index *faiss::Cloner::clone_Index(const faiss::Index *) at faiss/clone_index.cpp:144: clone not supported for this type of Index`
This diff support IndexResidual clone and unblocks gpu training
Reviewed By: sc268, mdouze
Differential Revision: D28614996
fbshipit-source-id: 9b1e5e7c5dd5da6d55f02594b062691565a86f49
Summary: This is necessary to share a `SQDistanceComputer` instance among multiple thread, when the codes are not stored in a faiss index. The function is `const` and thread-safe.
Reviewed By: philippv, mdouze
Differential Revision: D28623897
fbshipit-source-id: e527d98231bf690dc01191dcc597ee800b5e57a9
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1906
This PR implemented LSQ/LSQ++, a vector quantization technique described in the following two papers:
1. Revisiting additive quantization
2. LSQ++: Lower running time and higher recall in multi-codebook quantization
Here is a benchmark running on SIFT1M for 64 bits encoding:
```
===== lsq:
mean square error = 17335.390208
training time: 312.729779958725 s
encoding time: 244.6277096271515 s
===== pq:
mean square error = 23743.004672
training time: 1.1610801219940186 s
encoding time: 2.636141061782837 s
===== rq:
mean square error = 20999.737344
training time: 31.813055515289307 s
encoding time: 307.51959800720215 s
```
Changes:
1. Add LocalSearchQuantizer object
2. Fix an out of memory bug in ResidualQuantizer
3. Add a benchmark for evaluating quantizers
4. Add tests for LocalSearchQuantizer
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1862
Test Plan:
```
buck test //faiss/tests/:test_lsq
buck run mode/opt //faiss/benchs/:bench_quantizer -- lsq pq rq
```
Reviewed By: beauby
Differential Revision: D28376369
Pulled By: mdouze
fbshipit-source-id: 2a394d38bf75b9de0a1c2cd6faddf7dd362a6fa8
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1905
This PR added some tests to make sure the building with AVX2 works as we expected in Linux.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1792
Test Plan: buck test //faiss/tests/:test_fast_scan -- test_PQ4_speed
Reviewed By: beauby
Differential Revision: D27435796
Pulled By: mdouze
fbshipit-source-id: 901a1d0abd9cb45ccef541bd7a570eb2bd8aac5b
Summary:
related: https://github.com/facebookresearch/faiss/issues/1815, https://github.com/facebookresearch/faiss/issues/1880
`vshl` / `vshr` of ARM NEON requires immediate (compiletime constant) value as shift parameter.
However, the implementations of those intrinsics on GCC can receive runtime value.
Current faiss implementation depends on this, so some correct-behavioring compilers like Clang can't build faiss for aarch64.
This PR fix this issue; thus faiss applied this PR can be built with Clang for aarch64 machines like M1 Mac.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1882
Reviewed By: beauby
Differential Revision: D28465563
Pulled By: mdouze
fbshipit-source-id: e431dfb3b27c9728072f50b4bf9445a3f4a5ac43
Summary:
Also remove support for deprecated compute capabilities 3.5 and 5.2 in
CUDA 11.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1899
Reviewed By: mdouze
Differential Revision: D28539826
Pulled By: beauby
fbshipit-source-id: 6e8265f2bfd991ff3d14a6a5f76f9087271f3f75
Summary:
Current `faiss` contains some codes which will be warned by compilers when we will add some compile options like `-Wall -Wextra` .
IMHO, warning codes in `.cpp` and `.cu` doesn't need to be fixed if the policy of this project allows warning.
However, I think that it is better to fix the codes in `.h` and `.cuh` , which are also referenced by `faiss` users.
Currently it makes a error to `#include` some faiss headers like `#include<faiss/IndexHNSW.h>` when we compile the codes with `-pedantic-errors` .
This PR fix this problem.
In this PR, for the reasons above, we fixed `-Wpedantic` codes only in `.h` .
This PR doesn't change `faiss` behavior.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1888
Reviewed By: wickedfoo
Differential Revision: D28506963
Pulled By: beauby
fbshipit-source-id: cbdf0506a95890c9c1b829cb89ee60e69cf94a79
Summary:
This diff fixes a serious bug in the range search implementation.
During range search in a flat index, (exhaustive_L2sqr_seq and exhaustive_inner_product_seq) when running in multiple threads, the per-thread results are collected into RangeSearchPartialResult structures.
When the computation is finished, they are aggregated into a RangeSearchResult. In the previous version of the code, this loop was nested into a second loop that is used to check for KeyboardInterrupts. Thus, at each iteration, the results were overwritten.
The fix removes the outer loop. It is most likely useless anyways because the sequential code is called only for a small number of queries, for a larger number the BLAS version is used.
Reviewed By: wickedfoo
Differential Revision: D28486415
fbshipit-source-id: 89a52b17f6ca1ef68fc5e758f0e5a44d0df9fe38
Summary:
In the current `faiss` implementation for x86, `fvec_L2sqr` , `fvec_inner_product` , and `fvec_norm_L2sqr` are [optimized for any dimensionality](e86bf8cae1/faiss/utils/distances_simd.cpp (L404-L432)).
On the other hand, the functions for aarch64 are optimized [**only** if `d` is multiple for 4](e86bf8cae1/faiss/utils/distances_simd.cpp (L583-L584)); thus, they are not much fast for vectors with `d % 4 != 0` .
This PR has accelerated the above three functions for any input size on aarch64.
Kind regards,

- Evaluated on an AWS EC2 ARM instance (c6g.4xlarge)
- sift1m127 is the dataset with dropped trailing elements of sift1m
- Therefore, the vector length of sift1m127 is 127 that is not multiple of 4
- "optimized" runs 2.45-2.77 times faster than "original" with sift1m127
- Two methods, "original" and "optimized", are expected to achieve the same level of performance for sift1m
- And actually there is almost no significant difference
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1878
Reviewed By: beauby
Differential Revision: D28376329
Pulled By: mdouze
fbshipit-source-id: c68f13b4c426e56681d81efd8a27bd7bead819de
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1865
This diff chunks vectors to encode to make it more memory efficient.
Reviewed By: sc268
Differential Revision: D28234424
fbshipit-source-id: c1afd2aaff953d4ecd339800d5951ae1cae4789a
Summary:
Need to add an ssh key to the circleci to be able to debug
For my own ref, how to connect to the job:
```
[matthijs@matthijs-mbp /Users/matthijs/Desktop/faiss_github/circleci_keys] ssh -p 54782 38.39.188.110 -i id_ed25519
```
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1849
Reviewed By: wickedfoo
Differential Revision: D28234897
Pulled By: mdouze
fbshipit-source-id: 6827fa45f24b3e4bf586315bd38f18608d07ecf9
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1840
This diff is related to
https://github.com/facebookresearch/faiss/issues/1762
The ResultHandler introduced for FlatL2 and FlatIP was not multithreaded. This diff attempts to fix that. To be verified if it is indeed faster.
Reviewed By: wickedfoo
Differential Revision: D27939173
fbshipit-source-id: c85f01a97d4249fe0c6bfb04396b68a7a9fe643d
Summary:
This diff adds the following to bring the residual quantizer support on-par with PQ:
- IndexResidual can be built with index factory, serialized and used as a Faiss codec.
- ResidualCoarseQuantizer can be used as a coarse quantizer for inverted files.
The factory string looks like "RQ1x16_6x8" which means a first 16-bit quantizer then 6 8-bit ones. For IVF it's "IVF4096(RQ2x6),Flat".
Reviewed By: sc268
Differential Revision: D27865612
fbshipit-source-id: f9f11d29e9f89d3b6d4cd22e9a4f9222422d5f26
Summary:
related: https://github.com/facebookresearch/faiss/issues/1812
This PR improves the performance of `IndexPQFastScan` and `IndexIVFPQFastScan` on aarch64 devices, e.g., 60x faster on an AWS Arm instance with the SIFT1M dataset.
The contents of this PR are below:
- Add `simdlib_neon.h`
- `simdlib_neon.h` has `simdlib` compatible API, and they are implemented with Arm NEON intrinsics.
- `simdlib.h` includes `simdlib_neon.h` if `__aarch64__` is defined.
- Move `geteven` , `getodd` , `getlow128` , and `gethigh128` from `distances_simd.cpp` to `simdlib_avx2.h` .
- Port `geteven` , `getodd` , `getlow128` , and `gethigh128` for non-AVX2 environments.
- These codes are implemented with AVX2 intrinsics, so they have prevented to implement `compute_PQ_dis_tables_dsub2` for non-AVX2 environments.
- Now `simdlib_avx2.h` , `simdlib_emulated.h` , and `simdlib_neon.h` all have those functions.
- Enable `compute_PQ_dis_tables_dsub2` on aarch64
- Above change makes `compute_PQ_dis_tables_dsub2` independent from `geteven` and so on.
- `compute_PQ_dis_tables_dsub2` implemented with `simdlib_neon.h` is little faster than current implementation, so enabling that.
- In contrast, `compute_PQ_dis_tables_dsub2` implemented with `simdlib_emulated.h` is slower than current implementation, so we have not enabled it in our PR.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1815
Reviewed By: beauby
Differential Revision: D27760259
Pulled By: mdouze
fbshipit-source-id: 5df6168ac35ae0174bedf04508dbaf19f11fab3f
Summary:
related: https://github.com/facebookresearch/faiss/issues/1812
This PR improves the performance of contents in `simdlib_emulated.h` .
`IndexPQFastScan` and `IndexIVFPQFastScan` will become faster on non-AVX2 environments, e.g., 4x faster on SIFT1M.
This PR contains below changes:
- Use `template` instead of `std::function` on argument of `unary_func` and `binary_func`
- Because `std::function` hinders some optimizations like function inlining
- Use `const T&` instead of `T` for vector classes like `simd16uint16` on argument of functions
- Vector classes on `simdlib_emulated.h` has the data member as array, so the runtime cost for copying is not so low.
- Passing by const lvalue-ref prevents copy.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1814
Reviewed By: beauby
Differential Revision: D27760072
Pulled By: mdouze
fbshipit-source-id: cbc5a14658d1960b24ce55a395e71c80998742dc
Summary:
This diff includes:
- progressive dimension k-means.
- the ResidualQuantizer object
- GpuProgressiveDimIndexFactory so that it can be trained on GPU
- corresponding tests
- reference Python implementation of the same in scripts/matthijs/LCC_encoding
Reviewed By: wickedfoo
Differential Revision: D27608029
fbshipit-source-id: 9a8cf3310c8439a93641961ca8b042941f0f4249
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1817
There were instantiations of the k-selection templates that operated on float16 data. These are no longer needed as instead Faiss will process all data in float32 (though input data can still be in float16), so removing them to speed compilation time.
Reviewed By: beauby
Differential Revision: D27742889
fbshipit-source-id: a3cf72a10df15f335d18d1e7709ffe269024121d
Summary:
This diff implements brute-force all-pairwise distances between two different sets of vectors using any of the Faiss supported metrics on the GPU (L2, IP, L1, Lp, Linf, etc).
It is implemented using the same C++ interface as `bfKnn`, except when `k == -1`, all pairwise distances will be returned (no k-selection is made). A restriction exists at present where the entire output data must be able to reside on the same GPU which may be lifted at a subsequent point.
This interface is available in python via `faiss.pairwise_distance_gpu(res, xq, xb, D, metric)` with both numpy and pytorch support which will return all of the distances in D.
Also cleaned up CUDA stream usage a little bit in Distance.cu/Distance.cuh in the C++ implementation.
Reviewed By: mdouze
Differential Revision: D27686773
fbshipit-source-id: 8de6a699cda5d7077f0ab583e9ce76e630f0f687
Summary:
After initial positive feedback to the idea in https://github.com/facebookresearch/faiss/issues/1741 from mdouze, here are the patches
I currently have as a basis for discussion.
Matthijs suggests to not bother with the deprecation warnings at all, which is fine for me
as well, though I would normally still advocate to provide users with _some_ advance notice
before removing parts of an interface.
Fixes https://github.com/facebookresearch/faiss/issues/1741
PS. The deprecation warning is only shown once per session (per class)
PPS. I have tested in https://github.com/conda-forge/faiss-split-feedstock/pull/32 that the respective
classes remain available both through `import faiss` and `from faiss import *`.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1742
Reviewed By: mdouze
Differential Revision: D26978886
Pulled By: beauby
fbshipit-source-id: b52e2b5b5b0117af7cd95ef5df3128e9914633ad
Summary:
## Description
This PR added NSG into the index factory. Here are the supported index strings:
1. `NSG{0}` or `NSG{0},Flat`: Create an IndexNSGFlat with `R = {0}`.
2. `IVF{0}_NSG{1},{2}`: Create an IndexIVF using NSG as a coarse quantizer where `ncentroids = {0}`, `R = {1}` and `{2}` is the second level quantizer.
These two types of indexes may be the most useful ones. Other composite indexes could be supported in the future.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1758
Test Plan: buck test //faiss/tests/:test_factory
Reviewed By: beauby
Differential Revision: D27189479
Pulled By: mdouze
fbshipit-source-id: b60000f985c490ef2e7bc561b4e209f9f61c3cc8