Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2772
Resolves errors from overloaded ambiguous operators:
```
faiss/utils/partitioning.cpp:283:34: error: ISO C++20 considers use of overloaded operator '==' (with operand types 'faiss::simd16uint16' and 'faiss::simd16uint16') to be ambiguous despite there being a unique best viable function [-Werror,-Wambiguous-reversed-operator]
```
Reviewed By: alexanderguzhva, meyering
Differential Revision: D44186458
fbshipit-source-id: 0257fa0aaa4fe74c056bef751591f5f7e5357c9d
Summary: Big batch search can be running for hours so it's useful to have a checkpointing mechanism in case it's run on a best-effort cluster queue.
Reviewed By: algoriddle
Differential Revision: D44059758
fbshipit-source-id: 5cb5e80800c6d2bf76d9f6cb40736009cd5d4b8e
Summary:
https://github.com/facebookresearch/faiss/issues/2727
Implements search_with_params function on c_api for index.
Implemented c_api equivalents of SearchParameters and SearchParametersIVF.
My C/C++ is pretty rusty so I imagine my arguments to the new functions for each search parameters could be refined. Happy to take suggestions :)
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2732
Reviewed By: alexanderguzhva
Differential Revision: D43917264
Pulled By: mdouze
fbshipit-source-id: c9eaf0d96ec0fad4862528aac9b5946294f5e444
Summary:
Adds support for an IDSelector that takes in two IDSelectors and can perform a boolean operation on their is_member outcomes.
Current implementation is pretty naive and doesn't try to do any optimizations on the types of IDSelectors combined.
Also test cases are definitely lacking but can add more once approach is agreed upon.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2742
Reviewed By: algoriddle
Differential Revision: D43904855
Pulled By: mdouze
fbshipit-source-id: bbe687800a19b418ca30c9257fb0334c64ab5f52
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2737
IVFPQ with more than 8 bits per subquantizer seem to be acceptable in Faiss. So, comments were altered, additional unit tests were added.
Reviewed By: mdouze
Differential Revision: D43706459
fbshipit-source-id: 45d0cc6f43ec0198aa95d025f07b75a9c33e4db7
Summary:
The Doxyfile is used to generate the Faiss documentation in faiss.ai. In its current config it skips over important classes like ProductQuantizer. Hence this diff that forces it to pass over the impl/ subdirectories.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2741
Reviewed By: mlomeli1
Differential Revision: D43732475
Pulled By: mdouze
fbshipit-source-id: cf968238838051000fa31b4388e1f2beb7f451db
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2726
Old versions of clang (prior to 14) do not generate FMAs on x86 unless this option is used. On the other hand, ARM does not support the needed pragma flag.
Reviewed By: DenisYaroshevskiy
Differential Revision: D43503265
fbshipit-source-id: a024dd221288e44d4e2ade2f5db2c4402e26ff3d
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2725
fvec_inner_product_ref() is no longer needed, using fvec_inner_product() instead in all the cases
fvec_norm_L2sqr_ref() is no longer needed, using fvec_norm_L2sqr() instead in all the cases
Reviewed By: algoriddle
Differential Revision: D43472582
fbshipit-source-id: 2d99a022d420092aed2adfd57e12be5b3e652944
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2721
FAISS_PRAGMA_IMPRECISE_* macros were modified:
* Disabled ones on clang on arm, because it does not support `_Pragma("float_control(precise, off)")`
* Added missing pragma for the GCC compiler.
Reviewed By: alexanderguzhva
Differential Revision: D43437450
fbshipit-source-id: cec8042c3c8c7147ae7e2ffa1ac9e2232c8f1a92
Summary:
Pragmas:
turns out GCC does not support pragma float_control
I have no way to test it locally but I tried on godbolt: https://godbolt.org/z/zTzf7jd7c
I will put these pragmas in folly later but for now - let's just solve the problem.
Test:
The diff: D43353199 that produced slightly different floating point results broke this test.
After looking into it I don't believe that the original test data had any meaning and was just a dump of the output.
Accoding to mdouze the previous test data was questionable and using random numbers would be preferrable, so I did.
Reviewed By: alexanderguzhva
Differential Revision: D43393684
fbshipit-source-id: 691397302b58a98e1ccbe32feffa7aaeee96accd
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2696
Add fvec_L2sqr_ny_y_transposed() call which is used by ProductQuantizer in case of pq.sync_transposed_centroids() is enabled.
Reviewed By: mdouze
Differential Revision: D43057940
fbshipit-source-id: b82a0153fb10f1558c7db04e32d690b2ee7b9144
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2693
A new kernel for fvec_op_ny_D4 for x64 that reorganizes read ops.
This diff also introduces transpose microkernels for D2 and D8 for AVX2 for future needs.
Reviewed By: mdouze
Differential Revision: D43057318
fbshipit-source-id: 37b2e18139905fcdac69e4603038a15725a730dd
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2682
IndexShards normally sees the indexes as opaque, so there is no way to factrorize the coarse quantizer.
This diff introduces IndexIVFShards that handles IVF indexes with a common quantizer so that the quantization is computed only once.
Reviewed By: alexanderguzhva
Differential Revision: D42781513
fbshipit-source-id: 441316eff4c1ba0468501c456af9194ea5f042d6
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2683
Adds a small function and a test to demonstrate how to crop a RCQ to its first quantizers.
Also adds a SearchParameters to set the beam_factor of a RCQ.
Reviewed By: alexanderguzhva
Differential Revision: D42842378
fbshipit-source-id: b522269ce983cddc50c3d0e18b541954c5d7001d
Summary: Added the required terms of use, privacy policy and copyright statement for Meta OSS projects.
Reviewed By: algoriddle
Differential Revision: D43056233
fbshipit-source-id: 8541edd587107bf3129ff7f8b29ec26c016f779f
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2685
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2623
This diff is an attempt at dealing with limitations when I originally wrote the Faiss GPU code back in 2016. Namely, 64 bit integer arithmetic was really slow on the GPU (this was also the impetus for me adding the 32 vs 64 bit indexing logic to PyTorch GPU), and GPUs at the time were very memory limited.
However, there have been enough cases where people are attempting to use Faiss GPU to handle larger datasets, especially including > 2 GB flat indexes, which sometimes worked (total number of elements in a tensor being < 2^31 - 1 but the byte size could be larger) or sometimes didn't work (when considering byte offsets not word sizes). This needed fixing.
This diff changes the `Tensor`, `HostTensor` and `DeviceTensor` classes to default to using `faiss::idx_t` as the index type (64 bit), and also changes many other places in the GPU code to follow the same data type. Much of the code being changed follows from the type propagation coming from the `Tensor` class and sub-classes.
`GpuIndexIVF.setNumProbes` and `GpuIndexIVF.getNumProbes` functions have been removed, just set/get `nprobe` on the object as need (like the CPU side).
The net result is that flat indexes can contain > 2 GB of data, and individual IVF lists can similarity contain > 2 GB of data, as well as any tensor dimension can be over 2G (excepting vector dimension; see limitations below).
Notes:
- performance for large `k` (>= 512) or large `nprobe` (>= 512) deteriorates due to low occupancy in the k-selection kernels due to increased register pressure (we now maintain all indices as int64 rather than int32, thus requiring double the registers). However, based on tests this doesn't hurt more than 10-20% or so at these large sizes. Small `k`/`nprobe` has no significant effect, and in some cases is even faster (depending upon if we needed to produce int64 indices but computations were done internally using int32).
- `k`, like with CPU Faiss, is an int parameter everywhere still.
- `d`, like with CPU Faiss, is an int parameter everywhere except for places where dimensionality is inferred from a tensor dimension (in which case it is naturally `idx_`).
- `nprobe`, like `IndexIVF`, is `size_t` at the top level (`GpuIndexIVF` class member) but internally to the code it is either `int` or `idx_t` depending upon the context (e.g., if `nprobe` is extracted from a tensor dimension, it is already `idx_t`).
In the Faiss GPU code however, `k` and `nprobe` are still limited to 2048 maximum at present.
Reviewed By: alexanderguzhva
Differential Revision: D40494090
fbshipit-source-id: 8581aff86ff25091b47e59f749adebebf2d3b160
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2691
`faiss::gpu::DeviceVector<T>` (the GPU analogue of `std::vector`) is used in Faiss GPU to store IVF list data, among other things. `thrust::device_vector` could not be used at the time of the library's authorship due to lack of certain features (stream usage from what I recall).
Like `std::vector`, it involves a heuristic used to determine what the new vector capacity should be in order to allow for a particular number of elements. It previously has always rounded up allocation sizes to the next highest power of 2, thus incremental appending to a DeviceVector amortizes the cost of memory (re-)allocation and only infrequently results in full re-allocation.
However, this doubling behavior was not limited. While at small sizes of individual IVF lists, this doubling is not that much of an issue (and there is a reclaim memory function on indices to reclaim unused memory resulting from this amortization). However, when operating with data sets close to GPU memory limits, this doubling is dangerous, as attempting to allocate, say, a GpuIndexFlat with 4.x GB of vectors on a 8 GB GPU could run out of memory.
Now we introduce a new heuristic:
- below 4 M list sizes, we always double (as before);
- otherwise, below 128 M list sizes, we only increase by 1.25x;
- otherwise, the allocation made by DeviceVector is exactly sized to the size requested by the user.
Reviewed By: mdouze
Differential Revision: D42951571
fbshipit-source-id: 462d8af582725869ab5642b81e27db0e71774c6e
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2676
This is a cosmetic diff where default values for scalar fields are moved to the .h file where the object is declared rather than in the object constructor. The advantage is that it is easier to read the default value of a parameter from just the class.
Reviewed By: alexanderguzhva
Differential Revision: D42722205
fbshipit-source-id: 9d9bc4641068f6d6233f60f0a3a16ab793c94bb8
Summary:
IVF GPU indexes now support CPU quantizers.
In order for the the auto-tuning to support accessing the parameters of the quantizers, GpuAutoTune needs to be adapted.
Reviewed By: algoriddle
Differential Revision: D42633348
fbshipit-source-id: ef472855aa882ccde9d878ae09782204045e38c5
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2638
This diff is a more streamlined way of searching IVF indexes with precomputed clusters.
This will be used for experiments with hybrid CPU / GPU search.
Reviewed By: algoriddle
Differential Revision: D41301032
fbshipit-source-id: a1d645fd0f2bf806454dfd04971edc0a6200d20d
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2647
The `max_codes` search parameter for IVF indexes limits the number of distance computations that are performed.
Previously, the number of distance computations could exceed max_codes because inverted lists were scanned completely.
This diff changed this to scan the beginning of the last inverted list to reach `max_codes` exactly.
Reviewed By: alexanderguzhva
Differential Revision: D42367593
fbshipit-source-id: 67c88b93a407ab271397e913c5fa17104f4274c3
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2650
LocalSearchQuantizer::icm_encode_step was optimized:
* The loop orders were changes
* Memory allocations were optimized
* Different order for accessing 'binaries' tensor, because it is symmetric
* SIMD for the lookup of the best code.
Results are unchanged.
Also, fixes incorrect test in test_local_search_quantizer.py
Reviewed By: mdouze
Differential Revision: D42352124
fbshipit-source-id: bf7e349f2123e6ee99e0776cf15ad5fc1cf2439a
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2651
Modify approximate heap facility to maintain the correct order as if the input elements were pushed into the heap sequentially
Reviewed By: mdouze
Differential Revision: D42373532
fbshipit-source-id: 477dc8acd2567157e2b99076a566326821021c8c
Summary:
In ```cmp_with_scann.py```, we will save npy file for base and query vector file and gt file. However, we will only do this while the lib is faiss, if we directly run this script with scann lib it will complain that file does not exsit.
Therefore, the code should be refactored to save npy file from the beginning so that nothing will go wrong.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2573
Reviewed By: mdouze
Differential Revision: D42338435
Pulled By: algoriddle
fbshipit-source-id: 9227f95e1ff79f5329f6206a0cb7ca169185fdb3