Summary: add getstate / setstate to serialize indexes. Seems to work properly with object ownership etc.
Reviewed By: wickedfoo
Differential Revision: D26521228
fbshipit-source-id: ebbe08cfe2c15af2aa5b7ea1fc1bf87546066c23
Summary:
Trying to compile windows for AVX2 in https://github.com/conda-forge/faiss-split-feedstock/pull/27
(after https://github.com/facebookresearch/faiss/issues/1600) surfaced a bunch of things (https://github.com/facebookresearch/faiss/issues/1680, https://github.com/facebookresearch/faiss/issues/1681, https://github.com/facebookresearch/faiss/issues/1682), but the most voluminous problem
was MSVC being much worse at dealing with operator overloads and casts around `__m128` / `__m256`.
This lead to loads of errors that looked as follows:
```
[...]\faiss\utils\distances_simd.cpp(411): error C2676: binary '+=': '__m128' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(440): error C2676: binary '-': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(441): error C2676: binary '*': 'const __m256' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(446): error C2676: binary '+=': '__m128' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(451): error C2676: binary '-': '__m128' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(452): error C2676: binary '*': 'const __m128' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(459): error C2676: binary '-': '__m128' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(460): error C2676: binary '*': '__m128' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(471): error C2440: '<function-style-cast>': cannot convert from '__m256i' to '__m256'
```
I've followed https://software.intel.com/sites/landingpage/IntrinsicsGuide/ to try to replace everything correctly,
but this will surely require close review, because I'm not sure how well these code-paths are checked by the
test suite.
In any case, with the commits from https://github.com/facebookresearch/faiss/issues/1600#1666https://github.com/facebookresearch/faiss/issues/1680#1681https://github.com/facebookresearch/faiss/issues/1682, I was able to build `libfaiss` & `faiss`
for AVX2 on windows (while remaining "green" on linux/osx, both with & without AVX2).
Sidenote: the issues in the last commit (26fc7cf139)
were uncovered by adding the `__SSE3__` compat macros in https://github.com/facebookresearch/faiss/issues/1681.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1684
Test Plan: buck test //faiss/tests/...
Reviewed By: beauby
Differential Revision: D26454443
Pulled By: mdouze
fbshipit-source-id: 70df0818e357f1ecea6a056d619618df0236e0eb
Summary:
Upstreaming patches from https://github.com/conda-forge/faiss-split-feedstock/pull/27, follow-up (sorta) to https://github.com/facebookresearch/faiss/issues/1600.
Not sure if there are more CMake-native tricks to use here, but given that the flags don't have
an equivalent on the MSVC side, I think this approach is reasonable.
Without this patch, we would get:
```
cl : Command line warning D9002: ignoring unknown option '-mavx2'
cl : Command line warning D9002: ignoring unknown option '-mfma'
cl : Command line warning D9002: ignoring unknown option '-mf16c'
cl : Command line warning D9002: ignoring unknown option '-mpopcnt'
```
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1680
Reviewed By: wickedfoo
Differential Revision: D26484347
Pulled By: beauby
fbshipit-source-id: 2803132f2d81fe37dc494fc4c824b6e240ae973b
Summary:
Compute capability 86 is only available from CUDA 11.1 onwards, for
which Anaconda does not have a `cudatoolkit` package yet.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1694
Reviewed By: wickedfoo
Differential Revision: D26482788
Pulled By: beauby
fbshipit-source-id: c0c84e0433ea9d9b04a1572001bd7c0d2ee82988
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1690
The GPU and CPU were trained separately in the failing test, leading to fairly different PQ centroids. Instead, just train on the GPU and copy to the CPU like other tests.
Also silences the not enough centroids warnings.
Reviewed By: beauby
Differential Revision: D26470199
fbshipit-source-id: 1f7c036671c03ed4a97c8c4a44d3c5b9767019cb
Summary:
## Description
Fix the bug mentioned in https://github.com/facebookresearch/faiss/issues/1010. When `nprobe` is greater than `nlist` in `IndexIVF`, the program will crash because the index will ask the quantizer to return more centroids than it owns.
## Changes:
1. Set `nprobe` as `nlist` if it is greater than `nlist` during searching.
2. Add one test to detect this bug.
3. Fix typo in `IndexPQ.cpp`.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1673
Reviewed By: wickedfoo
Differential Revision: D26454420
Pulled By: mdouze
fbshipit-source-id: d1d0949e30802602e975a94ba873f9db29abd5ab
Summary: GCC flags a few more warnings than clang.
Reviewed By: wickedfoo
Differential Revision: D26421696
fbshipit-source-id: 3706ede205c03352667c1e483f014ca498666878
Summary: Copy construction of Aligned table was wrong, which crashed cloning of IVFPQ.
Reviewed By: wickedfoo
Differential Revision: D26426400
fbshipit-source-id: 1d43ea6309d0a56eb592f9d6c5b52282f494e653
Summary:
This diff exposes the ProductQuantizer `pq` object to the user for manipulation in Python just as `IndexIVFPQ` does.
If no clustering index object is provided in `pq`, we create a `GpuIndexFlatL2` in order to perform the PQ training on the GPU as well.
Also raises the error threshold a bit in some tests, as the previous ones seem to be triggered on a V100 GPU.
Fixes an issue with AddException + (CUDA 11 and/or V100 GPUs) as well, where a `cudaMalloc` failure now seems to set state that is returned by `cudaGetLastError`. This we now clear before continuing.
Fixes an issue (possible cuBLAS bug, following up with Nvidia):
cublasSgemmEx in libcublas.so.11.1.0.229 returning CUBLAS_STATUS_NOT_SUPPORTED but would work fine in CUDA 9.2 (V100 GPU)
cublasSgemmEx(handle, CUBLAS_OP_T, CUBLAS_OP_N,
64, 8, 64,
&alpha,
A, CUDA_R_16F, 64,
B, CUDA_R_16F, 64,
&beta,
C, CUDA_R_32F, 64);
Using cublasGemmEx with CUBLAS_COMPUTE_32F and CUBLAS_GEMM_DEFAULT would also fail, but using CUBLAS_COMPUTE_32F_PEDANTIC with cublasGemmEx succeeds. Using PEDANTIC for CUDA 11 + f16 arguments for now.
Reviewed By: mdouze
Differential Revision: D26331887
fbshipit-source-id: c65448c4c79b58dd49b0220b393056e431ef53c0
Summary:
This will allow us to support compute capabilities 8.0 and 8.6 (for
Ampere devices) with CUDA 11.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1671
Reviewed By: mdouze
Differential Revision: D26338700
Pulled By: beauby
fbshipit-source-id: f023e7a37504d79ab78a45319e5a9cb825e7604a
Summary: The IndexBinaryHash and IndexBinaryMultiHash knn search functions returned results in a random order. This diff fixes that to the standard decreasing Hamming distance order + adds a test for that. I noticed on a notebook from sc268.
Reviewed By: sc268
Differential Revision: D26324795
fbshipit-source-id: 1444e26950e24bfac297f34f3d481d902d8ee769
Summary:
When new GPU compute capabilities were released, DeviceDefs.cuh had to be manually updated to expect them, as we statically compile the warp size (32 in all of Nvidia's current GPUs) into kernel code.
In order to avoid having to change this header for each new GPU generation (e.g., the new RTX devices which are CC 8.6), instead we just assume the warp size is 32, but when we initialize a GPU device and its resources in StandardGpuResources, we check to make sure that the GPU has a warp size of 32 as expected. Much code would have to change for a non-32 warp size (e.g., 64, as seen in AMD GPUs), so this is a hard assert. It is likely that Nvidia will never change this anyways for this reason.
Also, as part of the PQ register change, I noticed that temporary memory allocations were only being aligned to 16 bytes. This could cause inefficiencies in terms of excess gmem transactions. Instead, we bump this up to 256 bytes as the guaranteed alignment for all temporary memory allocations, which is the same guarantee that cudaMalloc provides.
Reviewed By: mdouze
Differential Revision: D26259976
fbshipit-source-id: 10b5fc708fffc9433683e85b9fd60da18fa9ed28
Summary:
While preparing https://github.com/conda-forge/faiss-split-feedstock/pull/26, I grepped for the expected headers based on the files in the repo, à la:
```
>ls faiss/invlists/ | grep -E "h$"
BlockInvertedLists.h
DirectMap.h
InvertedLists.h
InvertedListsIOHook.h
OnDiskInvertedLists.h
```
Doing so uncovered that there were some headers missing (apparently) in `CMakeLists.txt`, namely:
```
faiss/impl/ResultHandler.h
faiss/gpu/impl/IVFInterleaved.cuh
faiss/gpu/impl/InterleavedCodes.h
faiss/gpu/utils/WarpPackedBits.cuh
```
It's possible that they were left out intentionally, but I didn't see something that would make me think so e.g. in [`ResultHandler.h`](https://github.com/facebookresearch/faiss/blob/master/faiss/impl/ResultHandler.h).
While I was at it, I decided to order the filenames consistently (alphabetically, except for the increasing bit-sizes for blockselect/warpselect, as is already the case for `impl/scan/IVFInterleaved<x>.cu`), but of course, those commits could easily be dropped.
By reviewing the commits separately, it should be clear (for the first two) from the equal number of deletions/insertions (and the simple diff) that this is just a reshuffle. The only additions are in the last commit.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1666
Reviewed By: wickedfoo
Differential Revision: D26248038
Pulled By: mdouze
fbshipit-source-id: 4add4959446deb16126c59b2d1e3f0305e6236c1
Summary: The order of xb an xq was different between `faiss.knn` and `faiss.knn_gpu`. Also the metric argument was called distance_type. This diff fixes both. Hopefully not too much external code depends on it.
Reviewed By: wickedfoo
Differential Revision: D26222853
fbshipit-source-id: b43e143d64d9ecbbdf541734895c13847cf2696c
Summary:
fp16 scalar quantizer is supported via IndexFlat with foat16 option.
This diff also splits the python GPU tests in 2 files.
Reviewed By: wickedfoo
Differential Revision: D26221563
fbshipit-source-id: c08fce27e6acedc486478b37ef77ccebcefb3dc0
Summary:
In the context of https://github.com/conda-forge/faiss-split-feedstock/issues/23, I discussed with some of the conda-folks how we should support AVX2 (and potentially other builds) for faiss. In the meantime, we'd like to follow the model that faiss itself is using (i.e. build with AVX2 and without and then load the corresponding library at runtime depending on CPU capabilities).
Since windows support for this is missing (and the other stuff is also non-portable in `loader.py`), I chased down `numpy.distutils.cpuinfo`, which is pretty outdated, and opened: https://github.com/numpy/numpy/issues/18058
While the [private API](https://github.com/numpy/numpy/issues/18058#issuecomment-749852711) is obviously something that _could_ change at any time, I still think it's better than platform-dependent shenanigans.
Opening this here to ideally upstream this right away, rather than carrying patches in the conda-forge feedstock.
TODO:
* [ ] adapt conda recipe for windows in this repo to also build avx2 version
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1600
Reviewed By: beauby
Differential Revision: D25994705
Pulled By: mdouze
fbshipit-source-id: 9986bcfd4be0f232a57c0a844c72ec0e308fff19
Summary:
Fast-scan tests were disabled on windows because of a heap corruption. This diff enables them because the free_aligned bug was fixed in the meantime.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1663
Reviewed By: beauby
Differential Revision: D26201040
Pulled By: mdouze
fbshipit-source-id: 8d6223b4e42ccb1ce2da6e2c51d9e0833199bde7
Summary:
This adds some more functions to the C API, under a new DeviceUtils_c.h module. Resolves https://github.com/facebookresearch/faiss/issues/1414.
- `faiss_get_num_gpus`
- `faiss_gpu_profiler_start`
- `faiss_gpu_profiler_stop`
- `faiss_gpu_sync_all_devices`
The only minor issue right now is that building this requires basing it against an older version of Faiss until the building system is updated to use CMake (https://github.com/facebookresearch/faiss/issues/1390). I have provided a separate branch with the same contribution which is based against a version that works and builds OK: [`imp/c_api/add_gpu_device_utils`](https://github.com/Enet4/faiss/tree/imp/c_api/add_gpu_device_utils)
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1613
Reviewed By: wickedfoo
Differential Revision: D25942933
Pulled By: mdouze
fbshipit-source-id: 5b73a86b0c1702dfb7b9e56bd741f72495aac2fd
Summary:
The general array transposition kernel for the GPU in Faiss had two issues.
One, there was a typo (`+` instead of `*`) which did not cause a correctness bug but was a severe performance issue, since the general transposition kernel was written in 2016/2017. This was causing large slowdowns with precomputed code usage that I noticed while profiling over IVFPQ issues.
Two, the general transposition code was written for the most generic case. The transposition that we care about/use the most in Faiss is a transposition of outermost dimensions, say transposing an array [s1 s2 s3] -> [s2 s1 s3], where there are one or more innermost dimensions which are still contiguous in the new layout. A separate kernel has been written to cover this transposition case.
Also updates the code to avoid `unsigned int` and `unsigned long` in lieu of `uint32_t` and `uint64_t`.
D25703821 (removing serialize tags for GPU tests) is reverted in this as well, as that change prevents all GPU tests from being run locally on devservers; RE might have implicit serialization, but local execution doesn't.
Reviewed By: beauby
Differential Revision: D25929892
fbshipit-source-id: 66ddfc56189305f698a85c44abdeb64eb95ffe6b
Summary: When running in a heavily parallelized env, the test becomes very slow and causes timeouts. Here we reduce the nb of threads.
Reviewed By: wickedfoo, Ben0mega
Differential Revision: D25921771
fbshipit-source-id: 1e0aacbb3e4f6e8f33ec893984b343eb5a610424
Summary: Fixes 2 bugs spotted by ASAN in the demo.
Reviewed By: wenjieX
Differential Revision: D25897053
fbshipit-source-id: fd2bed13faded42426cefc5ebe9d027adec78015
Summary: For range search evaluation, this diff adds optimized functions for ground-truth generation (on GPU).
Reviewed By: beauby
Differential Revision: D25822716
fbshipit-source-id: c5278dfad0510d24c2a5c87c1f8c81161fa8c5d3
Summary:
The IndexRefineFlat with pre-populated indexes could not be used because of the order of construction of the parent class. This diff fixes is. This addresses https://github.com/facebookresearch/faiss/issues/1604.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1607
Reviewed By: wickedfoo
Differential Revision: D25801869
Pulled By: mdouze
fbshipit-source-id: 6098065e497dff39f4dd7474fa7136ba3610ef7e
Summary:
A small compilation issue with gcc 7.3.0, does not appear with 7.4.0.
Also updated the readme.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1593
Reviewed By: beauby
Differential Revision: D25705885
Pulled By: mdouze
fbshipit-source-id: 920b35264463cdd6ad10bbb09e07cf483fcaa724
Summary:
These hooks, along with the creation of the `gh-pages` branch with a Sphinx-powered website ([preview](https://beauby.github.io/faiss/)) will ensure automatic rebuild of the C++ API (doxygen) docs upon modification on `master`.
Moreover, direct changes to the `gh-pages` branch will trigger a rebuild of the website as well.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1599
Reviewed By: wickedfoo
Differential Revision: D25723322
Pulled By: beauby
fbshipit-source-id: fafeed64b393dce8c569f9fd1f5bc3b004706589
Summary:
This avoids triggering the following warnings:
```
tests/test_ondisk_ivf.cpp:36:24: warning: 'tempnam' is deprecated: This function is provided for compatibility reasons only. Due to security concerns inherent in the design of tempnam(3), it is highly recommended that you use mkstemp(3) instead. [-Wdeprecated-declarations]
char *cfname = tempnam (nullptr, prefix);
^
tests/test_merge.cpp:34:24: warning: 'tempnam' is deprecated: This function is provided for compatibility reasons only. Due to security concerns inherent in the design of tempnam(3), it is highly recommended that you use mkstemp(3) instead. [-Wdeprecated-declarations]
char *cfname = tempnam (nullptr, prefix);
```
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1596
Reviewed By: wickedfoo
Differential Revision: D25710654
Pulled By: beauby
fbshipit-source-id: 2aa027c3b32f6cf7f41eb55360424ada6d200901
Summary:
CMake's SWIG module does not track dependencies to header files by
default, and they have to be stated manually.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1591
Reviewed By: wickedfoo
Differential Revision: D25705493
Pulled By: beauby
fbshipit-source-id: faf70415efb0db677ea3ee8e38495d9ed39432d7