Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2860
Optimized range search function where the GPU computes by default and falls back on gpu for queries where there are too many results.
Parallelize the CPU to GPU cloning, it seems to work.
Support range_search_preassigned in Python
Fix long-standing issue with SWIG exposed functions that did not release the GIL (in particular the MapLong2Long).
Adds a MapInt64ToInt64 that is more efficient than MapLong2Long.
Reviewed By: algoriddle
Differential Revision: D45672301
fbshipit-source-id: 2e77397c40083818584dbafa5427149359a2abfd
Summary:
Update the INSTALL to get a recipe "that works" to install GPU faiss (which is not obvious).
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2853
Reviewed By: algoriddle
Differential Revision: D45652590
Pulled By: mdouze
fbshipit-source-id: fededbdfda33e7a44d279e6bf16da316cff6fcb0
Summary:
This fixes the build, except for MacOS, where there's a problem with cmake + OpenMP. We can fix it separately.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2856
Reviewed By: mlomeli1
Differential Revision: D45704458
Pulled By: algoriddle
fbshipit-source-id: 0c09036ae5fa34ab114b857f407a35603986613a
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2848
Add selector support for IDMap wrapped indices.
Caveat: this requires to wrap the IDSelector with another selector. Since the params are const, the const is casted away.
This is a problem if the same params are used from multiple execution threads with different selectors. However, this seems rare enough to take the risk.
Reviewed By: alexanderguzhva
Differential Revision: D45598823
fbshipit-source-id: ec23465c13f1f8273a6a46f9aa869ccae2cdb79c
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2846
Adds a function to ivf_contrib to sort the inverted lists by size without changing the results. Also moves big_batch_search to its own module.
Reviewed By: algoriddle
Differential Revision: D45565880
fbshipit-source-id: 091a1c1c074f860d6953bf20d04523292fb55e1a
Summary:
I've broken out the FlatIndex / Distances changes from https://github.com/facebookresearch/faiss/issues/2521 into a separate PR to make things a litle easier to review. This does also include a couple other minor changes to the IVF Flat index which I used to make it easier to dispatch to the RAFT version. I can revert that change too if desired.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2707
Reviewed By: wickedfoo
Differential Revision: D44758912
Pulled By: algoriddle
fbshipit-source-id: b2544990b4e941a2558f5004bceec4af4fa1ad09
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2835
I don't think this provides much extra signal on top of Linux arm64, and we will still have the nightly. It's by far the most expensive to run, so let's save the $$$ and the planet.
Reviewed By: mlomeli1
Differential Revision: D45389825
fbshipit-source-id: 63fa6b37f3f7505f118c75f03605c065f1ad51f1
Summary: GIST1M is on the fair cluster but was not added to the datsets.py
Reviewed By: alexanderguzhva
Differential Revision: D45276664
fbshipit-source-id: 8db41d61b78983f5d01dedca1790618f80f6bc78
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2771
Test Plan:
A DEFINITE BUG. Fixing it improves the recall rate.
Reproduce with the following code:
```
MinimaxHeap heap(1);
heap.push(1, 1.0);
float v1 = 0;
heap.pop(&v1);
heap.push(1, 1.0);
assert(heap.nvalid == 1);
```
Baseline
```
[aguzhva@devgpu005.ftw6 ~/fbsource/buck-out/v2/gen/fbcode/faiss/benchs (064c246e0)]$ taskset -c 72-95 ./bench_hnsw.par 16 hnsw
load data
Testing HNSW Flat
add
hnsw_add_vertices: adding 1000000 elements on top of 0 (preset_levels=0)
max_level = 4
Adding 1 elements at level 4
Adding 26 elements at level 3
Adding 951 elements at level 2
Adding 30276 elements at level 1
Adding 968746 elements at level 0
Done in 14893.718 ms
search
efSearch 16 bounded queue True 0.007 ms per query, R@1 0.8823, missing rate 0.0000
efSearch 16 bounded queue False 0.008 ms per query, R@1 0.9120, missing rate 0.0000
efSearch 32 bounded queue True 0.012 ms per query, R@1 0.9565, missing rate 0.0000
efSearch 32 bounded queue False 0.011 ms per query, R@1 0.9641, missing rate 0.0000
efSearch 64 bounded queue True 0.018 ms per query, R@1 0.9889, missing rate 0.0000
efSearch 64 bounded queue False 0.019 ms per query, R@1 0.9896, missing rate 0.0000
efSearch 128 bounded queue True 0.036 ms per query, R@1 0.9970, missing rate 0.0000
efSearch 128 bounded queue False 0.037 ms per query, R@1 0.9970, missing rate 0.0000
efSearch 256 bounded queue True 0.062 ms per query, R@1 0.9991, missing rate 0.0000
efSearch 256 bounded queue False 0.067 ms per query, R@1 0.9991, missing rate 0.0000
[aguzhva@devgpu005.ftw6 ~/fbsource/buck-out/v2/gen/fbcode/faiss/benchs (fc6e9b938|remote/fbsource/stable...)]$ taskset -c 72-95 ./bench_hnsw.par 4 hnsw_sq
load data
Testing HNSW with a scalar quantizer
training
add
hnsw_add_vertices: adding 1000000 elements on top of 0 (preset_levels=0)
max_level = 5
Adding 1 elements at level 5
Adding 15 elements at level 4
Adding 194 elements at level 3
Adding 3693 elements at level 2
Adding 58500 elements at level 1
Adding 937597 elements at level 0
Done in 8900.962 ms
search
efSearch 16 0.003 ms per query, R@1 0.7365, missing rate 0.0000
efSearch 32 0.006 ms per query, R@1 0.8712, missing rate 0.0000
efSearch 64 0.011 ms per query, R@1 0.9415, missing rate 0.0000
efSearch 128 0.018 ms per query, R@1 0.9778, missing rate 0.0000
efSearch 256 0.036 ms per query, R@1 0.9917, missing rate 0.0000
```
Candidate:
```
[aguzhva@devgpu005.ftw6 ~/fbsource/buck-out/v2/gen/fbcode/faiss/benchs (064c246e0)]$ taskset -c 72-95 ./bench_hnsw.par 16 hnsw
load data
I0408 09:45:20.949554 3024612 ContainerResourceMonitor.cpp:68] devserver cgroup2Reader creation is successful
Testing HNSW Flat
add
hnsw_add_vertices: adding 1000000 elements on top of 0 (preset_levels=0)
max_level = 4
Adding 1 elements at level 4
Adding 26 elements at level 3
Adding 951 elements at level 2
Adding 30276 elements at level 1
Adding 968746 elements at level 0
Done in 14243.637 ms
search
efSearch 16 bounded queue True 0.006 ms per query, R@1 0.9122, missing rate 0.0000
efSearch 16 bounded queue False 0.006 ms per query, R@1 0.9122, missing rate 0.0000
efSearch 32 bounded queue True 0.011 ms per query, R@1 0.9643, missing rate 0.0000
efSearch 32 bounded queue False 0.011 ms per query, R@1 0.9644, missing rate 0.0000
efSearch 64 bounded queue True 0.018 ms per query, R@1 0.9880, missing rate 0.0000
efSearch 64 bounded queue False 0.020 ms per query, R@1 0.9880, missing rate 0.0000
efSearch 128 bounded queue True 0.036 ms per query, R@1 0.9969, missing rate 0.0000
efSearch 128 bounded queue False 0.035 ms per query, R@1 0.9969, missing rate 0.0000
efSearch 256 bounded queue True 0.064 ms per query, R@1 0.9994, missing rate 0.0000
efSearch 256 bounded queue False 0.062 ms per query, R@1 0.9994, missing rate 0.0000
[aguzhva@devgpu005.ftw6 ~/fbsource/buck-out/v2/gen/fbcode/faiss/benchs (6de3a2d76)]$ taskset -c 72-95 ./bench_hnsw.par 4 hnsw_sq
load data
Testing HNSW with a scalar quantizer
training
add
hnsw_add_vertices: adding 1000000 elements on top of 0 (preset_levels=0)
max_level = 5
Adding 1 elements at level 5
Adding 15 elements at level 4
Adding 194 elements at level 3
Adding 3693 elements at level 2
Adding 58500 elements at level 1
Adding 937597 elements at level 0
Done in 8451.601 ms
search
efSearch 16 0.004 ms per query, R@1 0.8025, missing rate 0.0000
efSearch 32 0.006 ms per query, R@1 0.8925, missing rate 0.0000
efSearch 64 0.011 ms per query, R@1 0.9480, missing rate 0.0000
efSearch 128 0.019 ms per query, R@1 0.9793, missing rate 0.0000
efSearch 256 0.035 ms per query, R@1 0.9919, missing rate 0.0000
```
Reviewed By: mdouze
Differential Revision: D44815702
Pulled By: alexanderguzhva
fbshipit-source-id: ca7c7e83a6560316af543bde125ac703bf2e1dac
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2810
A faster implementation for fvec_L2sqr_ny_nearest_Dx() (used in ProductQuantizer.compute_codes()) for the mode without transposed centroids enabled.
Uses in-register transposing rather than gather ops.
Reviewed By: mdouze
Differential Revision: D44631863
fbshipit-source-id: 026fdb4c133e238271c65a766d8eaa20be777033
Summary:
RAFT requires cmake 3.23.1, pulling it from conda-forge. We continue to keep the dependency on conda-forge minimal, hence the ordering of the channels and the pinning of sysroot to a specific version.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2808
Test Plan:
contbuild
Imported from GitHub, without a `Test Plan:` line.
Reviewed By: mlomeli1
Differential Revision: D44746827
Pulled By: algoriddle
fbshipit-source-id: ad576b11b257203bd0cafd57c2c2e7fd8d10ca98
Summary:
1. GPU builds use CircleCI base image, no docker
2. Switched to CUDA 11.4 (used to be 11.3)
3. Merged all build jobs into two parameterized targets: `build_cmake` and `build_conda`.
4. Cleaned up test execution, fixed bug of Python GPU tests not running on PRs
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2803
Reviewed By: mlomeli1
Differential Revision: D44541714
Pulled By: algoriddle
fbshipit-source-id: aa09ae638ecb6ef556d42f27a4bfaddad7355e50
Summary:
# Refactor
- Extract executors and merge all conda builds (that run on diffs) and conda package builds (that run nightly and on release) into two parameterized targets (`build_conda`, `deploy_conda`), except for GPU builds (at least for now)
- Similarly, introduce a `build_cmake` target that can be parameterized with executor, although run it for Linux x86_64 only (for now)
- Keep GPU targets separate (both conda package build and cmake) for now. Introduce "v2" targets that will eventually replace the current GPU build targets (we need to resolve GPU test failures).
- Removed `beauby/faiss-circleci:cpu` docker container, use the miniconda docker for Linux and machine images everywhere else. v2 GPU targets use the latest circleci images (see https://discuss.circleci.com/t/cuda-11-8-gpu-cuda-image-any-plans/47240/3)
# New/changed functionality
- Dropped CUDA 10
- Support for Linux arm64 conda packages
- Workflows have a consistent naming scheme, `OSX arm64 (conda)`, `Linux x86_64 (cmake)` etc.
- No cmake build for Linux or OSX arm64, replaced both with a conda build target only. We can reintroduce arm64 cmake workflows for both if needed (via additional parameterized build_cmake workflows), but it seemed unnecessary to me.
# Next steps
- Make v2 GPU builds work, deprecate v1, get rid of all docker stuff
- Merge GPU builds into cmake/conda build targets
- Possibly further unify package build and conda build targets
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2798
Test Plan: contbuild
Reviewed By: mlomeli1
Differential Revision: D44469783
Pulled By: algoriddle
fbshipit-source-id: 8489942fb7a4e4de1dd2d4466790e550191d15a1
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2787
Approximate top-k facilities allow to speedup IndexBinaryFlat search / hamming_knn_hc() search a bit, if the number k of needed nearest neighbors is small (no more than NB * BD for `ApproxTopK_mode_t::APPROX_TOPK_BUCKETS_NB_BD`, otherwise an exception is thrown)
Use case:
```
approx_mode: int = 0, 1, 2, 3 or 4
subq = faiss.downcast_IndexBinary(index.quantizer)
subq.approx_topk_mode
subq.approx_topk_mode = approx_mode
```
Please reference `faiss/utils/approx_topk/mode.h` for more detailed explanation of modes.
Reviewed By: mdouze
Differential Revision: D44398992
fbshipit-source-id: aac7c35c88d567ba73c88614a1c86f3e8b254ef3
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2782
Add a separate branch for ARM Hamming Distance computations. Also, improves a benchmark for hamming computer.
Reviewed By: mdouze
Differential Revision: D44397463
fbshipit-source-id: 1e44e8e7dd1c5b92e95e8afc754170b501d0feed
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2797
This is the last code instance of setNumProbes
Removing because some people still seem run into errors due to this.
Reviewed By: algoriddle
Differential Revision: D44421600
fbshipit-source-id: fbc1a9d49a0175ddf24c32dab5c1bdb5f1bbbac6
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2791
Removed building for 3.7 (D44373285 - not supported for M1), adding 3.10 with this diff.
The only change that matters is in `conda_build_config.yaml`, the others are about making the configs consistent between CPU and GPU.
Reviewed By: mlomeli1
Differential Revision: D44405573
fbshipit-source-id: ad933e08834593e55a35075c602e5f509a813e73
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2781
This is a benchmarking script for keypoint matching with labelled ground-truth.
Reviewed By: alexanderguzhva
Differential Revision: D44036091
fbshipit-source-id: d9d7c089c4d172b66f33dc968c00713a1b79c2d1
Summary:
This PR adds missing info about building C API to installation guide
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2706
Reviewed By: alexanderguzhva
Differential Revision: D44366748
Pulled By: mdouze
fbshipit-source-id: 0553571b7187ee9712c8dc73cc66e8839345912d
Summary:
Currently SIMD histogram subroutines are written with `simdlib` **and AVX2 intrinsics** .
This PR adds some functions to `simdlib` and removes AVX2 intrinsics from SIMD histogram subroutines, so faiss with this PR can execute histogram using ARM SIMD on aarch64.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2447
Reviewed By: alexanderguzhva
Differential Revision: D39461155
Pulled By: mdouze
fbshipit-source-id: 63a664e3e2ed462b451acc346ea58a2532f294c9