Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3011
After Alexandr's optimizations the ResidualQuantizer code has become harder to read. Split off the quantization code to a separate .h / .cpp to make it clearer.
Reviewed By: pemazare
Differential Revision: D48448614
fbshipit-source-id: c90d572ea3afe12a7a7e5092f88710e8eceaa2d1
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3030
Added default arguments to the .h file (for some reason I forgot this file when migrating default args).
Logging a hash value in MatrixStats, useful to check if two runs really really run on the same matrix...
Reviewed By: pemazare
Differential Revision: D48834343
fbshipit-source-id: 7c1948464e66ada1f462f4486f7cf3159bbf9dfd
Summary:
This is a minor bug that comes with a perf impact. The classic FAISS `FlatIndex` always uses expanded form of distance computation even though an argument `exactDistances` is provided. `RaftFlatIndex` was using this argument to determine whether the computation should be exhaustive.
This PR includes one additional change to eagerly initialize the `cublas_handle` on the `device_resources` instance when it's created.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3021
Reviewed By: pemazare
Differential Revision: D48739660
Pulled By: mdouze
fbshipit-source-id: a361334eb243df86c169c69d24bb10fed8876ee9
Summary: Python3 makes the use of `(object)` in class inheritance unnecessary. Let's modernize our code by eliminating this.
Reviewed By: palmje
Differential Revision: D48718370
fbshipit-source-id: 6794156f7dd835cca8e12b65067f95b6991a218c
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3007
There is a complicated interaction between SWIG and the python wrappers where the ownership of ParameterSpace arguments was stolen from Python.
This diff adds a test, fixes that behavior and fixes the referenced_objects construction
Reviewed By: mlomeli1
Differential Revision: D48404252
fbshipit-source-id: 8afa9e6c15d11451c27864223e33ed1187817224
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2984
It is not entirely trivial to access the NSG graph structure from Python (although it is a fixed size N-by-K matrix of vector ids).
This diff adds an inspect_tools function to do that.
Reviewed By: algoriddle
Differential Revision: D48026775
fbshipit-source-id: 94cd7be7f656bcd333d62586531f287ea8e052e5
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2986
A NaN vector is a vector with at least one NaN (not-a-number) entry.
After discussion in the Faiss team we decided that:
- training should throw an exception on NaN vectors
- added NaN vectors should be ignored (never returned)
- searched NaN vectors should return only -1s
This diff implements this for a few common index types + adds relevant tests.
Reviewed By: algoriddle
Differential Revision: D48031390
fbshipit-source-id: 99e7786582e91950e3a53c1d8bcffdd00b6afd24
Summary:
More efficient code for SQ8 for AVX2.
For clang-15, improves a number of Instructions per cycle (IPC) from 2.49 to 3.20
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2942
Reviewed By: algoriddle
Differential Revision: D47946167
Pulled By: mdouze
fbshipit-source-id: da864bac8d452f2eb111ca356e54a8a69cd03dbf
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2949
A more scalable alternative to `np.unique` for deduping large datasets with a quantized code.
Reviewed By: mlomeli1
Differential Revision: D47443953
fbshipit-source-id: 4a1554d4d4200b5fa657e9d8b7395bba9856a8e3
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2940
This test fails on some occasions.
After investigation it turns out this is due to non reproducible behavior IndexIVFFastScan::search_implem_14 with a parallel loop, where there are ties in the resutls (ie. the resulting distances are the same but not the ids).
As a workaround I relaxed the test slightly.
+ a fix in the checksum function.
Reviewed By: algoriddle
Differential Revision: D47229086
fbshipit-source-id: 55e53bcfe47cf33041cc7fd5691b5de65067ce0f
Summary: Useful info on github test runs is burried in spurious logging. Avoid this.
Reviewed By: mlomeli1
Differential Revision: D47209139
fbshipit-source-id: b5111c91e2b94f0c3678d599197f8e7094993df1
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2922
This parallelizes kernel compilation by taking a template function from much deeper in the stack than was previously the case and generating 128 compilation units rather than the original 8.
Reviewed By: mdouze
Differential Revision: D46674315
fbshipit-source-id: 830eeaf43dee2c081f735be47c809b28aa3a05f6
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2918
The HammingComputer class is optimized for several vector sizes. So far it's been the caller's responsiblity to instanciate the relevant optimized version.
This diff introduces a `dispatch_HammingComputer` function that can be called with a template class that is instanciated for all existing optimized HammingComputer's.
Reviewed By: algoriddle
Differential Revision: D46858553
fbshipit-source-id: 32c31689bba7c0b406b309fc8574c95fa24022ba
Summary:
Moving the raft build to a nightly, to remove the noise from the PR contbuilds.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2926
Reviewed By: mlomeli1
Differential Revision: D47016318
Pulled By: algoriddle
fbshipit-source-id: 3c60aa382b9aa68dcadb929e0e4afade13c9123e
Summary:
With a very big name for a `ParameterRange`, the `snprintf` call from `combination_name` can end up having a negative second parameter, causing a memory overflow, which can lead to a serious security issue.
We can checking that the second parameter is always >= 0 and throw an exception if not.
See the new GTEST.
Reviewed By: mdouze
Differential Revision: D46856956
fbshipit-source-id: 91c657ec028c462d4b808b595811342034e00133
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2914
The macros are part of a system to reduce compilation time via separate compilation units.
Unfortunately, the parallelization is across C++ template functions instead of NVCC invocations on kernel compilation, which would be much more effective.
This diff removes the preprocessor macros and expands them into templates.
Compilation time after this diff is given by [this buck2 output](https://www.internalfb.com/buck2/ae9e6b28-a1bd-4d46-8af8-2895e6f182c8) with 1,043s through impl/scan/IVFInterleaved2048.cu
Reviewed By: mdouze
Differential Revision: D46549341
fbshipit-source-id: 5c3457876fd649e03ebeac89e4d1713f091ee9f5
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2916
Overall better support for binary indexes:
- cloning (to CPU and GPU), only for BinaryFlat for now
- fix bug in reconstruct_n
- range_search_max_results
Reviewed By: algoriddle
Differential Revision: D46755778
fbshipit-source-id: 777ad90aff5c54a77f9685ed6512247a922c6ef5
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2901
This diff allows each GPU to work independently, a hot centroid (eg. out-of-distribution queries that hit a centroid heavily) will only block the one GPU that is processing it, others will continue to pick up work independently.
Reviewed By: mdouze
Differential Revision: D46521298
fbshipit-source-id: 171cb06cce8b2d16b7bd744799b105b3cd525be3
Summary:
This version is more concise and doesn't need a new scope to reduce visibility of local variable `i`.
Created from CodeHub with https://fburl.com/edit-in-codehub
Reviewed By: mdouze
Differential Revision: D46431189
fbshipit-source-id: 5bbe8df6014d8e25aeb8d5d15145b703e9651327
Summary:
- Use elementwise operation and reduction once instead of across-vector comparing operation twice
- Use already implemented supporting functions
- Unify semantics of `operator==` as same as `simd16uint16`
- `operator==` of `simd8uint32` and `simd8float32` had been implemented on https://github.com/facebookresearch/faiss/issues/2568, but these has not same semantics as `simd16uint16` (which had been implemented in a long time ago). For getting the vector equality as `bool` , now we should use `is_same_as` member function.
- Change `is_same_as` to accept any vector type as argument for `simdlib_neon`
- `is_same_as` has supported any vector type on `simdlib_avx2` and `simdlib_emulated` already
- Remove unused function `simd16uint16::is_same` on `simdlib_avx2`
- Is it typo of `is_same_as` ? Anyway it seems to be used unlikely
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2885
Reviewed By: mdouze
Differential Revision: D46330666
Pulled By: alexanderguzhva
fbshipit-source-id: 0ea14f8e9a8bda78f24a655219dffe3e07fc110f