1065 Commits

Author SHA1 Message Date
zh Wang
f3577ab8c0 Fix inconsistency of parameter naming (#2542)
Summary:
There are two different names for the same thing and name inconsistency between the function declaration and definition which uses `M2` in the function declaration and uses `nsq` in the function definition. It may cause confusion, so it's better to unify them.

Signed-off-by: zh Wang <rekind133@outlook.com>

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2542

Reviewed By: mdouze

Differential Revision: D42338110

Pulled By: algoriddle

fbshipit-source-id: d386e00fb8d8904051dd676bae1e7f4702a172d9
2023-01-04 02:07:35 -08:00
zh Wang
60c850e296 Fix hnsw benchmark (#2591)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2591

Reviewed By: mdouze

Differential Revision: D42337854

Pulled By: algoriddle

fbshipit-source-id: 222885fe0e1562deddd0f37c0dbedd1963c885e5
2023-01-04 01:49:09 -08:00
Alexandr Guzhva
b9bf2490a3 Add facilities for approximate evaluation of min-k distances via heap. Affects RQ / PRQ / RQ_LUT / PRQ_LUT (#2633)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2633

The core idea.
Instead of putting every element of the dataset into MaxHeap, split the dataset into buckets and let every bucket track elements min-1, min-2 or min-3 distances.

Applied to ResidualQuantizer class for vector codec purposes.

An example
```
rq.approx_topk_mode = faiss.APPROX_TOPK_BUCKETS_B16_D2
```

Reviewed By: mdouze

Differential Revision: D42044398

fbshipit-source-id: 43169026476650442806a31d1c1aa2d5d5028e65
2023-01-03 14:39:11 -08:00
Gergely Szilvasy
05a9d52833 increase circleci no_output_timeout (#2644)
Summary:
Linux builds are regularly failing with “Too long with no output (exceeded 10m0s): context deadline exceeded”

https://app.circleci.com/pipelines/github/facebookresearch/faiss/3343/workflows/a6357953-bbaa-438c-acfa-2507ceb008e8/jobs/16680?invite=true#step-103-467

Applying the fix suggested in https://support.circleci.com/hc/en-us/articles/360045268074-Build-Fails-with-Too-long-with-no-output-exceeded-10m0s-context-deadline-exceeded-

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2644

Reviewed By: alexanderguzhva

Differential Revision: D42313426

Pulled By: algoriddle

fbshipit-source-id: 8e56c23f5f600974820c198d50562e043c909ce1
2023-01-03 13:22:30 -08:00
chasingegg
954fb0f802 Remove unused code (#2603)
Summary:
Fix https://github.com/facebookresearch/faiss/issues/2602

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2603

Reviewed By: mlomeli1

Differential Revision: D42315147

Pulled By: algoriddle

fbshipit-source-id: b92cfb2595b767ec317369546d34361a8ff5ea99
2023-01-03 12:40:38 -08:00
Matthijs Douze
74ee67aefc CodePacker for non-contiguous code layouts (#2625)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2625

This diff introduces a new abstraction for the code layouts that are not simply flat one after another.

The packed codes are assumed to be packed together in fixed-size blocks. Hence, code `#i` is stored at offset `i % nvec` of block `floor(i / nvec)`. Each block has size `block_size`.

The `CodePacker` object takes care of the translation between packed and flat codes. The packing / unpacking functions are virtual functions now, but they could as well be inlined for performance.

The `CodePacker` object makes it possible to do manipulations onarrays of codes (including inverted lists) in a uniform way, for example merging / adding / updating / removing / converting to&from CPU.

In this diff, the only non-trivial CodePacker implemnted is for the FastScan code. The new functionality supported is merging IVFFastScan indexes.

Reviewed By: alexanderguzhva

Differential Revision: D42072972

fbshipit-source-id: d1f8bdbcf7ab0f454b5d9c37ba2720fd191833d0
2022-12-21 11:06:53 -08:00
Alexandr Guzhva
eaa67d8acf Speedup ResidualQuantizer sa_encode(), LUT=1 by avoiding r/w to a temporary buffer (#2599)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2599

Reviewed By: mdouze

Differential Revision: D41563612

fbshipit-source-id: 111f30e4aa8962709d1ef83cb78994e6e9639167
2022-12-19 11:41:01 -08:00
Alexandr Guzhva
f163e39726 Further improve exhaustive_L2sqr_blas (#2626)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2626

Improves the performance of fused kernels used in exhaustive_L2sqr_blas() call. The code parameters are tweaked to utilize concurrently working ALU and LOAD/STORE units in a single CPU core.

The parameters values were tweaked for AVX2 and AVX512 kernels, but ARM ones remained unchanged because there was no way for me to benchmark the changes. Please feel free to alter ones if you have access to an ARM machine.

Reviewed By: mdouze

Differential Revision: D42079875

fbshipit-source-id: f1f7dc1759dbad57827d1c1b1b2b399322b33df0
2022-12-19 08:38:19 -08:00
Jeff Johnson
590f6fb47d Faiss pytorch bridge: revert to TypedStorage (#2631)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2631

The pytorch in fbcode complains about `storage()` saying it is deprecated and we need to move to UntypedStorage `_storage()`, while github CI is using an older version of pytorch where `_storage()` doesn't exist.

As it is only a warning not an error in fbcode, revert to the old form, but we'll likely have to change to `_storage()` eventually.

Reviewed By: alexanderguzhva

Differential Revision: D42107029

fbshipit-source-id: 699c15932e6ae48cd1c60ebb7212dcd9b47626f6
2022-12-16 15:38:08 -08:00
Jeff Johnson
4bb7aa4b77 Faiss + Torch fixes, re-enable k = 2048
Summary:
This diff fixes four separate issues:

- Using the pytorch bridge produces the following deprecation warning. We switch to `_storage()` instead.
```
torch_utils.py:51: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor._storage() instead of tensor.storage()
  x.storage().data_ptr() + x.storage_offset() * 4)
```
- The `storage_offset` for certain types was wrong, but this would only affect torch tensors that were a view into a storage that didn't begin at the beginning.
- The `reconstruct_n` numpy pytorch bridge function allowed passing `-1` for `ni` which indicated that all vectors should be reconstructed. The torch bridge didn't follow this and throw an error:
```
TypeError: torch_replacement_reconstruct_n() missing 2 required positional arguments: 'n0' and 'ni'
```
- Choosing values in the range (1024, 2048] for `k` or `nprobe` were broken in D37777979; this is now fixed again.

Reviewed By: alexanderguzhva

Differential Revision: D42041239

fbshipit-source-id: c7d9b4aba63db8ac73e271c8ef34e231002963d9
2022-12-14 16:21:22 -08:00
Alexandr Guzhva
eee58b3319 Speedup tests for cppcontrib_sadecode kernels (#2620)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2620

Reduce the number of points to match the number of clusters needed for every index, so that a clustering procedure would skip doing actual clustering.

Reviewed By: mdouze

Differential Revision: D41964901

fbshipit-source-id: 8be8b3fda8f07a66b18b85072e1f745483cdd956
2022-12-12 11:59:05 -08:00
Matthijs Douze
240e6dda08 Fix test timeouts (#2618)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2618

The Faiss tests run in dev mode are very slow
The PQ polysemous training is particularly sensitive to this with the default settings.
This diff adds a "np" suffix to two PQ factory strings to disable polysemous training. The tests that are detected as flaky because they occasionally time out.

Reviewed By: alexanderguzhva

Differential Revision: D41955699

fbshipit-source-id: b1e0382a0142a3ed28b498c5ea6f5499de2c1b3f
2022-12-12 09:04:43 -08:00
Matthijs Douze
fa53e2c941 Implementation of big-batch IVF search (single machine) (#2567)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2567

Intuitively, it should be easier to handle big-batch searches because all distance computations for a set of queries can be done locally within each inverted list.

This benchmark implements this in pure python (but should be close to optimal in terms of speed), on CPU for IndexIVFFlat, IndexIVFPQ and IndexIVFScalarQuantizer. GPU is also supported.

The results are not systematically better, see https://docs.google.com/document/d/1d3YuV8uN7hut6aOATCOMx8Ut-QEl_oRnJdPgDBRF1QA/edit?usp=sharing

Reviewed By: algoriddle

Differential Revision: D41098338

fbshipit-source-id: 479e471b0d541f242d420f581775d57b708a61b8
2022-12-09 08:53:13 -08:00
Matthijs Douze
9f13e43486 Building blocks for big batch IVF search
Summary:
Adds:
-  a sparse update function to the heaps
- bucket sort functions
- an IndexRandom index to serve as a dummy coarse quantizer for testing

Reviewed By: algoriddle

Differential Revision: D41804055

fbshipit-source-id: 9402b31c37c367aa8554271d8c88bc93cc1e2bda
2022-12-08 09:34:16 -08:00
Matthijs Douze
96c868f50b move invertedlists splitting to InvertedLists.h (#2611)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2611

Moves the invlist splitting code so that it can be used independently from the IndexIVF.

Add a simple test for the splitting code.

Fix a bug in the IndexShards implementation.

Reviewed By: alexanderguzhva

Differential Revision: D41807025

fbshipit-source-id: 3f53afc5f81744343597bdfcfa90daa4f324a673
2022-12-08 01:58:22 -08:00
Matthijs Douze
a996a4a052 Put idx_t in the faiss namespace (#2582)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2582

A few more or less cosmetic improvements
* Index::idx_t was in the Index object, which does not make much sense, this diff moves it to faiss::idx_t
* replace multiprocessing.dummy with multiprocessing.pool
* add Alexandr as a core contributor of Faiss in the README ;-)

```
for i in $( find . -name \*.cu -o -name \*.cuh -o -name \*.h -o -name \*.cpp ) ; do
  sed -i s/Index::idx_t/idx_t/ $i
done
```

For the fbcode deps:
```
for i in $( fbgs Index::idx_t --exclude fbcode/faiss -l ) ; do
   sed -i s/Index::idx_t/idx_t/ $i
done
```

Reviewed By: algoriddle

Differential Revision: D41437507

fbshipit-source-id: 8300f2a3ae97cace6172f3f14a9be3a83999fb89
2022-11-30 08:25:30 -08:00
Alexandr Guzhva
699a9712b9 Speedup ResidualQuantizer sa_encode() by pooling memory allocations (#2598)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2598

Significantly speeds up sa_encode() for RQ and PRQ both LUT=0 and LUT=1 versions by preallocating the needed buffers.

Reviewed By: mdouze

Differential Revision: D41320670

fbshipit-source-id: fa0bbe251013def2c961eb9d19f8408630831e9e
2022-11-28 13:42:54 -08:00
Gergely Szilvasy
d83600b164 fix Windows build (#2594)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2594

Bumping `mkl` to 2021 and installing `mkl-devel` in the build environment to fix the Windows nightly build.

Reviewed By: mlomeli1

Differential Revision: D41534391

fbshipit-source-id: 7c681f530a1efe649cd176135a23ebb0fb44d70f
2022-11-25 11:58:33 -08:00
Philip Pronin
2647516421 remove unnecessary nbits_per_idx check
Summary:
`IndexIVFPQ` doesn't seem to depend on single-byte code words, and
encoding / decoding is abstracted away out of it.

There is some reachable logic that supports only `pq.nbits == 8`
(polysemous training is the only one I found), but those are themselves
are gated, and missing support is an implementation detail (also, the
existing `<=` is wrong even if the intention was to gate on polysemous
training support).

Reviewed By: luciang

Differential Revision: D41483414

fbshipit-source-id: 06f471c25293e01242d7bab37ff54e709edc710b
2022-11-22 16:42:28 -08:00
Jeff Johnson
80e6bb9db3 Split out Faiss multi gpu tests to use different RE worker (#2585)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2585

Tests that absolutely require multiple GPUs to execute must use a different RE worker, as the default RE worker only has 1 gpu.

See https://fb.workplace.com/groups/regpu/posts/664180131728460 for context.

Reviewed By: mdouze

Differential Revision: D41390866

fbshipit-source-id: 117b9717631de80182bc061f4b985b80aed1aafd
2022-11-21 22:23:26 -08:00
Jeff Johnson
e3d12c7133 Faiss GPU: add device specifier for bfKnn (#2584)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2584

The `bfKnn` C++ function and `knn_gpu` Python functions for running brute-force k-NN on the GPU did not have a way to specify the GPU device on which the search should run, as it simply used the current thread-local `cudaGetDevice(...)` setting in the CUDA runtime API.

This is unlike the GPU index classes which takes a device argument in the index config struct. Now, both the C++ and Python interface to bfKnn have an optional argument to specify the device.

Default behavior is the current behavior; if the `device` is -1 then the current CUDA thread-local device is used, otherwise we perform the work on the desired device.

Reviewed By: mdouze

Differential Revision: D41448254

fbshipit-source-id: a63c68c12edbe4d725b9fc2a749d5dc935574e12
2022-11-21 18:20:32 -08:00
Gergely Szilvasy
a099c74269 add clone for IndexRefine and others + tests (#2539)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2539

Adding `clone_index()` for `IndexRefine` and `IndexRefineFlat` and several others index types.

https://github.com/facebookresearch/faiss/issues/2517

Note the change in `index_factory.cpp`, `RFlat` now constructs an `IndexRefineFlat`.

Reviewed By: mdouze

Differential Revision: D40511409

fbshipit-source-id: e55852eacb1f7d08be8c18005a82802939b4a6d9
2022-11-21 04:43:39 -08:00
Alexandr Guzhva
0b74765cca Speedup exhaustive_L2sqr_blas for AVX2, ARM NEON and AVX512 (#2568)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2568

Add a fused kernel for exhaustive_L2sqr_blas() call that combines a computation of dot product and the search for the nearest centroid. As a result, no temporary dot product values are written and read in RAM.

Speeds up the training of PQx[1] indices for dsub = 1, 2, 4, 8, and the effect is higher for higher values of [1]. AVX512 version provides additional overloads for dsub = 12, 16.

The speedup is also beneficial for higher values of pq.cp.max_points_per_centroid (which is 256 by default).

Speeds up IVFPQ training as well.

AVX512 kernel is not enabled, but I've seen it speeding up the training TWICE versus AVX2 version. So, please feel free to use it by enabling AVX512 manually.

Reviewed By: mdouze

Differential Revision: D41166766

fbshipit-source-id: 443014e2e59396b3a90b9171fec8c8191052bcf4
2022-11-14 17:01:52 -08:00
Jeff Johnson
ab13122669 Faiss GPU IVF large query batch fix (#2572)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2572

This is a fix to https://github.com/facebookresearch/faiss/issues/2561

namely, GpuIndexIVFFlat didn't work for query batch sizes larger than 65536, due to that being the maximum grid Y dimension.

This code fixes the IVFFlat code to instead perform a grid loop over the queries as needed for >65536, and in other cases which are not currently violated by large batch queries but we're simply passing in some array size as a grid Y parameter, I added asserts to catch this for the future.

Added two tests for IVFPQ and IVFFlat for the large batch case. The IVFPQ large batch test passed before, but IVFFlat reproduced the same issue as seen in GH issue 2561.

Reviewed By: alexanderguzhva

Differential Revision: D41184878

fbshipit-source-id: 92a87bd095319d6dcd73c76ba9044c019b8ca71c
2022-11-10 10:14:23 -08:00
Alexandr Guzhva
0a622d2d78 Update docs for benchmarks in benchs/ directory (#2565)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2565

Reviewed By: mdouze

Differential Revision: D40856253

fbshipit-source-id: 78f549bb37cdb3e6f562d877f5e33fa1c20834dc
2022-11-08 08:44:42 -08:00
Maria
19f7696dee Updated changelog with implemented features for 1.7.3 release (#2564)
Summary:
This PR adds the features that AbdelrahmanElmeniawy worked on during his internship and the speedups by alexanderguzhva in the CHANGELOG, ahead of the 1.7.3 release

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2564

Reviewed By: algoriddle

Differential Revision: D41119343

Pulled By: mlomeli1

fbshipit-source-id: b41ce354440dea2a6f8f214bf6654ff453ef10e7
v1.7.3
2022-11-08 03:44:51 -08:00
Alexandr Guzhva
771b1a8e37 Introduce transposed centroid table to speedup ProductQuantizer::compute_codes() (#2562)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2562

Introduce a table of transposed centroids in ProductQuantizer that significantly speeds up ProductQuantizer::compute_codes() call for certain PQ parameters, so speeds up search queries.

* ::sync_tranposed_centroids() call is used to fill the table
* ::clear_transposed_centroids() call clear the table, so that the original baseline code is used for ::compute_codes()

Reviewed By: mdouze

Differential Revision: D40763338

fbshipit-source-id: 87b40e5dd2f8c3cadeb94c1cd9e8a4a5b6ffa97d
2022-11-06 08:32:54 -08:00
Maria
02ef6b6c09 Prepare for v1.7.3 release (#2560)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2560

Reviewed By: mdouze

Differential Revision: D40947377

Pulled By: mlomeli1

fbshipit-source-id: b7c6f0ea85b5c3b2005e96af3d504edd137e42fc
2022-11-04 09:48:11 -07:00
Maria
f81097c6e6 Fix osx nightly (#2559)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2559

Reviewed By: algoriddle

Differential Revision: D40943198

Pulled By: mlomeli1

fbshipit-source-id: 9339d3211fcfa37674a35d52b9678c2d912f2529
2022-11-03 01:57:23 -07:00
Alexandr Guzhva
e11a3cf292 Benchmark for SADecodeKernels (#2554)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2554

Reviewed By: mdouze

Differential Revision: D40484008

fbshipit-source-id: c5f9b3c1a42a4ff4ff565d7c3f96af58c967b599
2022-10-31 14:55:08 -07:00
Matthijs Douze
1c4cb67855 Support more indexes for merge (#2533)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2533

Implements merge_from for IndexIDMap[2] and IndexPreTransform. In the process, split off IndexIDMap to their own .h/.cpp files.

Reviewed By: alexanderguzhva

Differential Revision: D40420373

fbshipit-source-id: 1570a460706dd3fbc1447f9fcc0e2721eab869bb
2022-10-31 11:10:42 -07:00
Matthijs Douze
ed4f5cf331 raise for range_search on IVFFastScan
Summary:
previously, range_search on IVFFastScan crashed because it used the range_search implem in IndexIVF that tries to obtain an InvertedListsScanner which raises an exception that is not propagated properly in openmp to python.
This diff just throws an  exception right away.

Reviewed By: mlomeli1

Differential Revision: D40853406

fbshipit-source-id: e594a3af682b79868233e32a94aea80579378fc0
2022-10-31 09:42:35 -07:00
Alexandr Guzhva
d738fe071b more comments for SADecodeKernels.h (#2553)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2553

Reviewed By: mdouze

Differential Revision: D40477061

fbshipit-source-id: 88f26a8c1385c4706dbe2ae1d252449b4bb4774d
2022-10-31 09:27:11 -07:00
Gergely Szilvasy
13a0b68b27 fix for conda inspect failure (#2552)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2552

The `conda inspect` commands in the `test` section fail without `conda-build` in the `test` environment.

Reviewed By: mlomeli1

Differential Revision: D40793051

fbshipit-source-id: 184418cfa8d0efd6af6b0c806f7bddbeba176732
2022-10-28 04:25:37 -07:00
Jeff Johnson
3b352f4e69 GpuIndex::search_and_reconstruct, GpuFlatIndex::reserve fixes (#2550)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2550

This diff contains two fixes:

-`GpuIndex::search_and_reconstruct` was implemented in D37777979 (the default `faiss::Index` implementation never worked on GPU if given GPU input data), but the number of vectors passed to reconstruct was wrong in that diff. This fixes that, and includes a test for `search_and_reconstruct` as well.

-`GpuFlatIndex::reserve` only worked properly if you were calling `add` afterwards. If not, then this would potentially leave the index in a bad state. This bug has existed since 2016 in GPU Faiss.

Also implemented a test for a massive `GpuIndexFlat` index (more than 4 GB of data). Proper implementation of large (>2 GB) indexes via 64 bit indexing arithmetic will be done in a followup diff which touches most of the GPU code.

Reviewed By: alexanderguzhva

Differential Revision: D40765397

fbshipit-source-id: 7eb4368e7588aea144bc5bcc53fd11b1e70f33ea
2022-10-27 15:07:13 -07:00
Paul Saab
f5b34e256c Don't use #pragma once for the include headers. (#2544)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2544

Don't use #pragma once for the include headers.

Reviewed By: rahulg

Differential Revision: D40544318

fbshipit-source-id: 129e6de27d569fd46ccc460a262de3b991f568bc
2022-10-20 13:50:41 -07:00
Gergely Szilvasy
a85aea9aee bumping gtest to 1.12.1 (#2538)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2538

Due to https://github.com/google/googletest/issues/3219 gtest doesn't compile with gcc 11 - Ubuntu 22.04 has GCC 11.2 currently.

The fix was https://github.com/google/googletest/pull/3024 so I'm bumping gtest to latest.

Reviewed By: alexanderguzhva

Differential Revision: D40512746

fbshipit-source-id: 75f3c3c7f8a117af8430c2f74a7f8d164ca9877b
2022-10-20 03:23:20 -07:00
Alexandr Guzhva
ce94df4ea8 Speedup IndexRowwiseMinMax::sa_decode() (#2536)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2536

Allocated temporary buffers are of the correct sizes.

Reviewed By: mdouze

Differential Revision: D40439826

fbshipit-source-id: 97087953ce9c1c98b4ab38cab2223c7191ea7025
2022-10-17 15:22:46 -07:00
Alexandr Guzhva
8ff1bc259d Additional C++ templates for fast sa_decode: add 8x compression level for AVX2 inline code. (#2532)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2532

Add 8x compression level, such as 'PQ64np' for 128-dim data. Prior to this change, only higher compression rates were supported.

Reviewed By: mdouze

Differential Revision: D40312821

fbshipit-source-id: 7dba4e9b8d432f5f7be618c0e7ef50dac2f88497
2022-10-17 11:14:32 -07:00
Jeff Johnson
f39e0c1bd1 Fix GpuIndexFlat float16 memory bloat
Summary:
D37777979 included a change in order to allow usage of CPU index types for the IVF coarse quantizer. For residual computation, the centroids of the coarse quantizer IVF cells needed to be on the GPU in float32. If a GPUIndexFlat is used as the coarse quantizer for an IVF index, and if that index was float16, or if a CPU index was used as a coarse quantizer, a shadow copy of the centroids was made in float32 for IVF usage.

However, this shadow copy is only needed if a GPU float16 flat index is used as an IVF coarse quantizer. Previously we were always duplicating this data whether a GpuIndexFlat was used in an IVF index or not.

This diff restricts the construction of the shadow float32 data to only cases where we are using the GpuIndexFlat in an IVF index. Otherwise, the GpuIndexFlat, if float16, will only retain float16 data.

This should prevent the problem with memory bloat with massive float16 flat indexes.

Ideally the shadow float32 values for GPU coarse indices shouldn't be needed as all, but this will require updating the IVFPQ code to allow usage of float16 IVF centroids. This is something I will pursue in a less time-limited diff.

This diff also changes the GpuIndexFlat reconstruct methods to use kernels explicitly designed for operating on float16 and float32 data as needed, rather than having access to the entire matrix of float32 values.

Also added some additional assertions in order to track down issues.

An additional problem as seen with N2630278 post-D37777979 is that calling reconstruct on a large flat index (one where there are more than 2^31 scalar elements in the index) results in int32 overflow error in the reconstruct kernel that would be called for a single vector or a contiguous range of vectors. Previously, this use case was handled by `cudaMemcpyAsync` using `size_t` etc. calculation, but now in order to handle float16 and float32 in the same manner, there is an explicit kernel to do the copy and conversion if needed, avoiding a separate copy then conversion. The error as seen in that notebook was a fault in the reconstruct by range kernel.

This kernel has been temporarily fixed to not have the int32 indexing problems. Since when Faiss GPU was written in 2016, GPU memories have become a lot larger and it now seems the time to support (u)int64 indexing everywhere. I am adding this minimal change for now to fix this fault but early next week I will do a pass over the entire Faiss GPU code to update to using `Index::idx_t` as the indexing type everywhere, which should remove problems in dealing with large datasets.

Reviewed By: mdouze

Differential Revision: D40355184

fbshipit-source-id: 78f8b5d5aebcba610d3cd46f2cb2d26276e0ff15
2022-10-14 19:00:40 -07:00
Abdelrahman Elmeniawy
47a9953a35 add remove and merge features for IndexFastScan (#2497)
Summary:
* Modify pq4_get_paked_element to make it not depend on an auxiliary table
* Create pq4_set_packed_element which sets a single element in codes in packed format
(These methods would be used in merge and remove for IndexFastScan
get method is also used in FastScan indices for reconstruction)
* Add remove feature for IndexFastScan
* Add merge feature for indexFast Scan

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2497

Test Plan:
cd build && make -j
make test
cd faiss/python && python setup.py build && cd ../../..
PYTHONPATH="$(ls -d ./build/faiss/python/build/lib*/)" pytest tests/test_*.py

Reviewed By: mdouze

Differential Revision: D39927403

Pulled By: mdouze

fbshipit-source-id: 45271b98419203dfb1cea4f4e7eaf0662523a5b5
2022-10-11 04:14:29 -07:00
Jeff Johnson
16d5ec755f GPU IVF indices now allow arbitrary quantizer instances + search_preassigned (#2519)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2519

From the beginning, the CPU IVF index design allowed rather arbitrary index instances to be used as the coarse (level 1) quantizer. On the GPU, IVF indices automatically constructed a FlatIndex (internal object that a GpuIndexFlat wraps) to be the coarse quantizer, and did not allow substituting anything else.

This diff allows for the GPU to function like the CPU IndexIVF classes, namely that there is a `quantizer` instance that can be arbitrarily substituted assuming that the coarse quantizer has the same number of vectors as the IVF nlist. Also, we now support any CPU or GPU index as the coarse quantizer for a GpuIndexIVF.

We detect internally if the IVF quantizer instance is a GPU index instance, in which case we can avoid d2h/h2d data copies as needed and pass device data directly to the plugged GPU coarse quantizer. If the plugged coarse quantizer is a CPU instance, then proper d2h/h2d copies for data are inserted as needed.

As some GPU IVF indices operate on the residual with respect to the coarse quantizer, it is necessary that the IVF index has access to the coarse centroids, even if the index is a CPU index. When a CPU index is used as a coarse quantizer, a reconstruction and copy of all of the coarse centroids to the GPU is performed. If the user changes the quantizer instance, or otherwise modifies the quantizer, in order for the GPU IVF index to recognize this change, a function `GpuIndexIVF::updateQuantizer()` must be called to update this cached state. If the coarse quantizer instance is a `GpuIndexFlat` then no separate cached copy is made as we can have direct access to the `FlatIndex` centroid storage.

Additionally, the `IndexIVF::search_preassigned` interface has been added to all GPU IVF instances via `GpuIndexIVF`. Conversion as needed from CPU arrays to GPU is done based on the address space of the passed inputs.

Other additional changes:
- Removed the `storeTransposed` functionality of `GpuIndexFlat`, as specified by `GpuIndexFlatConfig::storeTransposed`. This was a feature added back in 2016 to potentially accelerate coarse quantizer lookups by avoiding a transposition during matrix multiplication. This feature was not much used, and was an internal implementation detail, and supporting it with the new pluggable functionality wasn't worth it, so this transposition functionality was removed from the code but the parameter in `GpuIndexFlatConfig` still remained.

- This change also required updating index handling code to be `Index::idx_t` (64 bit) based instead of 32 bit in many instances, as any CPU and non-flat index GPU instances will be reporting IVF cells via `Index::idx_t`. 32 bit indices were used in much of the original Faiss due to the poor performance of 64 bit integers versus 32 bit integers.

- Refactored and deleted some redundant code between the `GPUIndexIVF` subclasses (`GpuIndexIVFFlat`, `GpuIndexIVFPQ`, `GpuIndexIVFScalarQuantizer`) for `search` and `search_preassigned`. This is now done by adding an interface to the IVFBase class which contains GPU-specific state (and is a CUDA file/header so is hidden behind an opaque pointer) and virtual functions to provide dispatch to the IVFFlat (which also implements IVFSQ) or IVFPQ classes in gpu/impl.

- Some of the `GpuIndexIVF` subclasses didn't have a default metric parameter (`METRIC_L2`), unlike the CPU versions. Added this default parameter to the header.

- Updated the check for the passed-in coarse quantizer in `GpuIndexIVF` to more closely correspond to the CPU version. Previously it was throwing an error if the coarse quantizer had ntotal not equal to nlist.

- Moved code that sets the proper GPU device (`DeviceScope` etc) to the very top of the functions that need it. It is critical that this is called, and nice to be able to visually verify that it is being set in these functions. Some functions buried it deep within (though not after the code that actually needed that scope to be set).

Reviewed By: mdouze

Differential Revision: D37777979

fbshipit-source-id: 517f611c2afdae87e79258bf1b3a92be406ade86
2022-10-11 00:14:24 -07:00
generatedunixname89002005325676
2c780f4477 Daily arc lint --take CLANGFORMAT
Reviewed By: ivanmurashko

Differential Revision: D40204059

fbshipit-source-id: ed3b6a7b754bd920fe7563a14370a0bb3ea239b3
2022-10-10 01:57:38 -07:00
Matthijs Douze
af25054e2d support IDSelector in more classes (#2509)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2509

Adds support for:
- IDSelector for Flat and SQ
- search_type in SearchParametersPQ
- IDSelectors implemented in Python (slow but good for testing)

Start optimization of IDSelectorRange and IDSelectorArray for IndexFlat and IDSelectorRange for IndexIVF

Reviewed By: alexanderguzhva

Differential Revision: D40037795

fbshipit-source-id: 61e01acb43c6aa39fea2c3b67a8bba9072383b74
2022-10-06 23:03:23 -07:00
Matthijs Douze
a64c76fadd Fix sub-object ownership of python interface of IVFSpectralHash
Summary: Code would crash when deallocating the coarse quantizer for a IVFSpectralHash.

Reviewed By: algoriddle

Differential Revision: D40053030

fbshipit-source-id: 6a2987a6983f0e5fc5c5b6296d9000354176af83
2022-10-04 07:54:00 -07:00
Matthijs Douze
c5b49b79df split __init__.py into subsections (#2508)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2508

the Faiss python module was in a monolythic __init__.py
This diff splits it in several sub-modules.
The tricky thing is to make inter-dependencies work.

Reviewed By: alexanderguzhva

Differential Revision: D39969794

fbshipit-source-id: 6e7f896a4b35a7c1a0a1f3a986daa32a00bfae6b
2022-10-03 11:45:41 -07:00
Matthijs Douze
df9c49c335 fix windows compilation and test (#2505)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2505

the SearchParameters made the swig wrapper too long. This diff attempts to work around.

Reviewed By: alexanderguzhva

Differential Revision: D39998713

fbshipit-source-id: 6938b5ca1c64bdc748899407909f7e59f62c0de3
2022-10-03 01:42:16 -07:00
Matthijs Douze
dd814b5f14 IVF filtering based on IDSelector (no init split) (#2483)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2483

This diff changes the following:
1. all search functions now take a `SearchParameters` argument that overrides the internal search parameters
2. the default implementation for most classes throws when the params argument is non-nullptr / non-None
3. the IndexIVF and IndexHNSW classes have functioning SearchPArameters
4. the SearchParameters includes an IDSelector that can search only in a subset of the index based on a defined subset of ids

There is also some refactoring: the IDSelector was moved to its own .h/.cpp and python/__init__.py is spit in parts.

The diff is quite bulky because the search function prototypes need to be changed in all index classes.

Things to fix in subsequent diffs:

- support SearchParameters for more index types (Flat variants)

- better sub-object ownership for SearchParams (with std::unique_ptr?)

- special handling of IDSelectorRange to make it faster

Reviewed By: alexanderguzhva

Differential Revision: D39852589

fbshipit-source-id: 4988bdb5b9bee1207cd327d3f80bf5e0e2467fe1
2022-09-30 06:40:03 -07:00
Lucas Hosseini
19147f241e Fix OSX CI. (#2482)
Summary:
Fixes OSX CI by pinning pytorch version for interop tests. The "real" fix is already landed in pytorch but has not been released yet.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2482

Reviewed By: alexanderguzhva

Differential Revision: D39891113

Pulled By: beauby

fbshipit-source-id: fa79bf9de1c93e056260ea64613e37625edfecc3
2022-09-28 11:22:43 -07:00
Lucas Hosseini
384dc32031 Fix Windows packages. (#2496)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2496

Reviewed By: alexanderguzhva

Differential Revision: D39859408

Pulled By: beauby

fbshipit-source-id: b3bd06374bc0a815d297d972e0277d56c5789a66
2022-09-28 07:05:09 -07:00