1065 Commits

Author SHA1 Message Date
Maria Lomeli
c09992bc8a Back out "Better NaN handling" (#3006)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3006

Original commit changeset: 99e7786582e9

Original Phabricator Diff: D48031390

Reviewed By: algoriddle

Differential Revision: D48353221

fbshipit-source-id: fd326f2a45d20f68507ca39a33a325528651b37d
2023-08-15 09:32:01 -07:00
Fernando Gasperi
e3deb71cdb Enable for faiss tests (#3002)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3002

title

Reviewed By: jbardini

Differential Revision: D48266242

fbshipit-source-id: b53e186f1954916a90dc8dbba67963f40d0aead7
2023-08-14 08:03:40 -07:00
Gergely Szilvasy
ef7e945b4d remove avx2 from raft cmake contbuild
Summary: Unnecessary for contbuild and doubles the build time.

Reviewed By: mlomeli1

Differential Revision: D48148734

fbshipit-source-id: ca44a1e328ce6980c8a867a33ce311fe6eeb90e0
2023-08-08 11:44:14 -07:00
Matthijs Douze
687457b2f4 Access graph structure for NSG (#2984)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2984

It is not entirely trivial to access the NSG graph structure from Python (although it is a fixed size N-by-K matrix of vector ids).
This diff adds an inspect_tools function to do that.

Reviewed By: algoriddle

Differential Revision: D48026775

fbshipit-source-id: 94cd7be7f656bcd333d62586531f287ea8e052e5
2023-08-04 06:55:24 -07:00
Gergely Szilvasy
da16d9d3ca simplify raft build (#2983)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2983

Reviewed By: mdouze

Differential Revision: D48063550

Pulled By: algoriddle

fbshipit-source-id: c67e13cec97f4de8cc30cae47186593dbe0bdadb
2023-08-04 06:52:07 -07:00
Matthijs Douze
a3fbf2d61c Better NaN handling (#2986)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2986

A NaN vector is a vector with at least one NaN (not-a-number) entry.
After discussion in the Faiss team we decided that:
- training should throw an exception on NaN vectors
- added NaN vectors should be ignored (never returned)
- searched NaN vectors should return only -1s

This diff implements this for a few common index types + adds relevant tests.

Reviewed By: algoriddle

Differential Revision: D48031390

fbshipit-source-id: 99e7786582e91950e3a53c1d8bcffdd00b6afd24
2023-08-04 06:51:06 -07:00
generatedunixname89002005325676
a4ddb18605 Daily arc lint --take CLANGFORMAT
Reviewed By: 0x1eaf

Differential Revision: D47985815

fbshipit-source-id: 47bbe26ec689ac5521fe94ab52d174c60ded2ba5
2023-08-02 07:34:56 -07:00
Maria
35dac924d1 Added version to nighly install (#2982)
Summary:
The gpu nightly package install command did not install v1.7.4, see [P801820926](https://www.internalfb.com/intern/paste/P801820926)

Adding the version fixes this issue, see [P801849181](https://www.internalfb.com/intern/paste/P801849181)

Funnily enough, faiss-cpu nightly command works fine, see [P801848411](https://www.internalfb.com/intern/paste/P801848411)

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2982

Reviewed By: mdouze

Differential Revision: D47952190

Pulled By: mlomeli1

fbshipit-source-id: 2185197e0a513c7da441d791c0b373f06f570f62
2023-08-01 12:14:35 -07:00
Alexandr Guzhva
5a95d47858 Upgrade AVX2 code for SQ8 (#2942)
Summary:
More efficient code for SQ8 for AVX2.
For clang-15, improves a number of Instructions per cycle (IPC) from 2.49 to 3.20

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2942

Reviewed By: algoriddle

Differential Revision: D47946167

Pulled By: mdouze

fbshipit-source-id: da864bac8d452f2eb111ca356e54a8a69cd03dbf
2023-08-01 06:08:44 -07:00
youcheng huang
0aae4d3eec fix hnsw shrink_neighbor_list comment (#2980)
Summary:
This pr is to fix the issue https://github.com/facebookresearch/faiss/issues/2978 .

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2980

Reviewed By: mdouze

Differential Revision: D47950592

Pulled By: mlomeli1

fbshipit-source-id: 32ef06c3775f7234a5a4bb4dab36c176edea2d1f
2023-08-01 05:01:30 -07:00
Corey J. Nolet
7bf714928c Adding libraft dependency to speed up compile times with USE_RAFT (#2958)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2958

Reviewed By: mlomeli1, mdouze

Differential Revision: D47678341

Pulled By: algoriddle

fbshipit-source-id: 2ab2d0e8349498faa0fc59ac9800da29a201c766
2023-07-31 07:37:27 -07:00
Gergely Szilvasy
726143d056 install libraft for cmake build (#2968)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2968

Reviewed By: mlomeli1, mdouze

Differential Revision: D47677660

Pulled By: algoriddle

fbshipit-source-id: 8fad8323ea3c0a264149c76fc9519d9c63346d00
2023-07-31 07:37:27 -07:00
Gergely Szilvasy
821a401ae9 CodeSet for deduping large datasets (#2949)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2949

A more scalable alternative to `np.unique` for deduping large datasets with a quantized code.

Reviewed By: mlomeli1

Differential Revision: D47443953

fbshipit-source-id: 4a1554d4d4200b5fa657e9d8b7395bba9856a8e3
2023-07-19 10:05:46 -07:00
Matthijs Douze
43d86e3073 Relax IVF AQ FastScan (#2940)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2940

This test fails on some occasions.
After investigation it turns out this is due to non reproducible behavior IndexIVFFastScan::search_implem_14 with a parallel loop, where there are ties in the resutls (ie. the resulting distances are the same but not the ids).
As a workaround I relaxed the test slightly.
+ a fix in the checksum function.

Reviewed By: algoriddle

Differential Revision: D47229086

fbshipit-source-id: 55e53bcfe47cf33041cc7fd5691b5de65067ce0f
2023-07-05 21:51:12 -07:00
Maria
a757806ae9 added blas=1.0=mkl to INSTALL (#2939)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2939

Reviewed By: algoriddle

Differential Revision: D47229098

Pulled By: mlomeli1

fbshipit-source-id: 91761499d9cd13ecafe12186ddbd80224c2e7410
2023-07-05 10:05:19 -07:00
Sid Jha
d48e777412 Fix import (#2936)
Summary:
Previous import does not exist.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2936

Reviewed By: mlomeli1

Differential Revision: D47221019

Pulled By: mdouze

fbshipit-source-id: 9ceeba229a10dd4b66da3483cc7695b198e1a8d8
2023-07-05 06:59:05 -07:00
Matthijs Douze
1c1d5c808f Make tests a little less verbose
Summary: Useful info on github test runs is burried in spurious logging. Avoid this.

Reviewed By: mlomeli1

Differential Revision: D47209139

fbshipit-source-id: b5111c91e2b94f0c3678d599197f8e7094993df1
2023-07-04 07:02:53 -07:00
Richard Barnes
4bfdd4324f Parallelize kernel compilation in FAISS (#2922)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2922

This parallelizes kernel compilation by taking a template function from much deeper in the stack than was previously the case and generating 128 compilation units rather than the original 8.

Reviewed By: mdouze

Differential Revision: D46674315

fbshipit-source-id: 830eeaf43dee2c081f735be47c809b28aa3a05f6
2023-06-30 01:30:01 -07:00
Matthijs Douze
a91a2887fe use dispatcher function to call HammingComputer (#2918)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2918

The HammingComputer class is optimized for several vector sizes. So far it's been the caller's responsiblity to instanciate the relevant optimized version.

This diff introduces a `dispatch_HammingComputer` function that can be called with a template class that is instanciated for all existing optimized HammingComputer's.

Reviewed By: algoriddle

Differential Revision: D46858553

fbshipit-source-id: 32c31689bba7c0b406b309fc8574c95fa24022ba
2023-06-26 14:06:10 -07:00
Matthijs Douze
a27036aa72 add small benchmark for hamming computers
Summary: to measure impact of hamming computer diff

Reviewed By: algoriddle

Differential Revision: D46913890

fbshipit-source-id: 7b9850205885b9b7c5f394f17a79ba222e7b1e2e
2023-06-26 14:06:10 -07:00
Gergely Szilvasy
391601dc3f relax test_ivf_train_2level threshold (#2927)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2927

Reviewed By: mlomeli1

Differential Revision: D47017009

fbshipit-source-id: cfa1df4b9632b085d3a61b56d8617bebd7e5aad6
2023-06-26 05:02:47 -07:00
Gergely Szilvasy
1d7c05de5f raft nightly (#2926)
Summary:
Moving the raft build to a nightly, to remove the noise from the PR contbuilds.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2926

Reviewed By: mlomeli1

Differential Revision: D47016318

Pulled By: algoriddle

fbshipit-source-id: 3c60aa382b9aa68dcadb929e0e4afade13c9123e
2023-06-26 03:10:05 -07:00
Octavian Guzu
9126f863d4 Prevent snprintf vulnerability
Summary:
With a very big name for a `ParameterRange`, the `snprintf` call from `combination_name` can end up having a negative second parameter, causing  a memory overflow, which can lead to a serious security issue.

We can checking that the second parameter is always >= 0 and throw an exception if not.

See the new GTEST.

Reviewed By: mdouze

Differential Revision: D46856956

fbshipit-source-id: 91c657ec028c462d4b808b595811342034e00133
2023-06-23 08:52:20 -07:00
Richard Barnes
8ac4e41983 Switch //faiss/gpu to use templates instead of macros (#2914)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2914

The macros are part of a system to reduce compilation time via separate compilation units.

Unfortunately, the parallelization is across C++ template functions instead of NVCC invocations on kernel compilation, which would be much more effective.

This diff removes the preprocessor macros and expands them into templates.

Compilation time after this diff is given by [this buck2 output](https://www.internalfb.com/buck2/ae9e6b28-a1bd-4d46-8af8-2895e6f182c8) with 1,043s through impl/scan/IVFInterleaved2048.cu

Reviewed By: mdouze

Differential Revision: D46549341

fbshipit-source-id: 5c3457876fd649e03ebeac89e4d1713f091ee9f5
2023-06-21 08:04:58 -07:00
Gergely Szilvasy
e0741ca5d7 fix for lib/jvm/languages/python/bin/conda no such file (#2917)
Summary:
environment: line 9: /opt/conda/lib/jvm/languages/python/bin/conda: No such file or directory

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2917

Reviewed By: mdouze

Differential Revision: D46841321

Pulled By: algoriddle

fbshipit-source-id: bdfbc16fbf422406c5195293dd4730f71a261e40
2023-06-21 00:29:51 -07:00
Gergely Szilvasy
f69b1db60a update installation instructions with notes about mkl and the nvidia channel
Reviewed By: mdouze

Differential Revision: D46844223

fbshipit-source-id: 1a0862c160f2c9656db68b80475712815ee81daa
2023-06-19 11:47:31 -07:00
Matthijs Douze
07fe2b622f Binary cloning and GPU range search (#2916)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2916

Overall better support for binary indexes:
- cloning (to CPU and GPU), only for BinaryFlat for now
- fix bug in reconstruct_n
- range_search_max_results

Reviewed By: algoriddle

Differential Revision: D46755778

fbshipit-source-id: 777ad90aff5c54a77f9685ed6512247a922c6ef5
2023-06-19 06:05:14 -07:00
Gergely Szilvasy
e153cac419 fix the osx nightly build (#2896)
Summary:
Based on comments in https://github.com/conda/conda-build/issues/4498

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2896

Reviewed By: mdouze

Differential Revision: D46802512

Pulled By: algoriddle

fbshipit-source-id: 7449b2f0db08fdd793770a44afb659d7ac28e3cd
2023-06-16 13:01:17 -07:00
Gergely Szilvasy
092606b293 bbs producer/consumer threading (#2901)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2901

This diff allows each GPU to work independently, a hot centroid (eg. out-of-distribution queries that hit a centroid heavily) will only block the one GPU that is processing it, others will continue to pick up work independently.

Reviewed By: mdouze

Differential Revision: D46521298

fbshipit-source-id: 171cb06cce8b2d16b7bd744799b105b3cd525be3
2023-06-14 07:58:44 -07:00
I
d8a6350607 Update docs (C++11 -> C++17) (#2907)
Summary:
following https://github.com/facebookresearch/faiss/issues/2899

This PR doesn't affect the software behavior

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2907

Reviewed By: mdouze

Differential Revision: D46720499

Pulled By: algoriddle

fbshipit-source-id: 00b47baf526a94449e2b1c9ca5fcd4cf961f6f17
2023-06-14 05:06:15 -07:00
Gergely Szilvasy
6951466b43 raft enabled cmake build (#2898)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2898

Reviewed By: mdouze

Differential Revision: D46561295

Pulled By: algoriddle

fbshipit-source-id: b9806c0c52acf82124c3b2e0095b1c1979318dcd
2023-06-13 08:43:18 -07:00
Richard Barnes
27ffd14ae4 Use C++17 [[fallthrough]] in faiss/utils/distances_simd.cpp (#2913)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2913

Reviewed By: algoriddle

Differential Revision: D46603510

fbshipit-source-id: 374d530d79176ac553b40d5ad04bf83d4920b107
2023-06-12 15:07:08 -07:00
Richard Barnes
100beb8565 Use C++17 [[fallthrough]] in faiss/utils/hamming_distance/avx2-inl.h
Reviewed By: mdouze

Differential Revision: D46603512

fbshipit-source-id: fa4bab4d24f5c9e2a3506f2a67d3a7db2a01512f
2023-06-12 08:19:22 -07:00
Richard Barnes
463ffd8e28 Indicate that fallthrough is intentional in faiss (#2897)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2897

Reviewed By: algoriddle

Differential Revision: D46385243

fbshipit-source-id: f08b16c9db91edca53cdbf0932a990c8c1f9d0db
2023-06-08 12:22:11 -07:00
Taras Tsugrii
8ec166c9fd Simplify non-optimal points removal.
Summary:
This version is more concise and doesn't need a new scope to reduce visibility of local variable `i`.

Created from CodeHub with https://fburl.com/edit-in-codehub

Reviewed By: mdouze

Differential Revision: D46431189

fbshipit-source-id: 5bbe8df6014d8e25aeb8d5d15145b703e9651327
2023-06-08 08:50:28 -07:00
Taras Tsugrii
f82298ffe5 Remove unused unordered_map include. (#2900)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2900

This makes builds brittle and slows down builds.

Reviewed By: algoriddle

Differential Revision: D46445595

fbshipit-source-id: 03a02e274922dd6215e467ead148890d79b3c2f8
2023-06-07 12:39:24 -07:00
Gergely Szilvasy
451f6cdbe5 c++ 17 (#2899)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2899

Reviewed By: mlomeli1

Differential Revision: D46521588

Pulled By: algoriddle

fbshipit-source-id: 6ac4b9d7590329317455d35256cab9dc820dfccf
2023-06-07 09:10:11 -07:00
I
9c884225c1 Some changes to simdlib (#2885)
Summary:
- Use elementwise operation and reduction once instead of across-vector comparing operation twice
- Use already implemented supporting functions
- Unify semantics of `operator==` as same as `simd16uint16`
    - `operator==` of `simd8uint32` and `simd8float32` had been implemented on https://github.com/facebookresearch/faiss/issues/2568, but these has not same semantics as `simd16uint16` (which had been implemented in a long time ago). For getting the vector equality as `bool` , now we should use `is_same_as` member function.
- Change `is_same_as` to accept any vector type as argument for `simdlib_neon`
    - `is_same_as` has supported any vector type on `simdlib_avx2` and `simdlib_emulated` already
- Remove unused function `simd16uint16::is_same` on `simdlib_avx2`
    - Is it typo of `is_same_as` ? Anyway it seems to be used unlikely

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2885

Reviewed By: mdouze

Differential Revision: D46330666

Pulled By: alexanderguzhva

fbshipit-source-id: 0ea14f8e9a8bda78f24a655219dffe3e07fc110f
2023-06-01 07:39:02 -07:00
I
bbc95b1a6c Fix windows CI (#2889)
Summary:
https://github.com/facebookresearch/faiss/issues/2882 added [a for loop, which has unsigned index, qualified with `#pragma omp parallel for`](https://github.com/facebookresearch/faiss/pull/2882/files#diff-5a89dcb99a1cce3f297c7f7dfc8e221306b281d4ced6dac1e0fc0fa54188195fR449-R452), but it seems that [MSVC doesn't support unsigned index with `#pragma omp parallel for`](https://app.circleci.com/pipelines/github/facebookresearch/faiss/4220/workflows/ee72de05-6ead-42d9-8ec5-44772e9fd41b/jobs/22529?invite=true#step-104-333) (I think this would not be conformed to OpenMP specification, but...)

I (finally) change the loop with signed index. This changes introduce the precondition `n <= std::numeric_limits<std::make_signed_t<std::size_t>>::max()` , but usually this is `true` I think, so I just put this limitation as a comment instead of any `FAISS_ASSERT` or something like that.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2889

Reviewed By: wickedfoo

Differential Revision: D46325322

Pulled By: alexanderguzhva

fbshipit-source-id: c68f4c8be3db188ac067e053c6c716e2896f75c0
2023-05-31 13:00:30 -07:00
Matthijs Douze
90349f264b Large two-level clustering (#2882)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2882

A two level clustering version where the training data does not need to fit in RAM.

Reviewed By: algoriddle

Differential Revision: D44557021

fbshipit-source-id: 892d4fec4588eb33da6e7a82c15040f39426485e
2023-05-31 00:15:03 -07:00
Alexandr Guzhva
6fd0cb60be fix a typo (#2881)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2881

Reviewed By: algoriddle

Differential Revision: D46227909

fbshipit-source-id: 9af689947f003b1f9c1dcdedcb1783b78b4bd21a
2023-05-26 11:48:19 -07:00
Alexandr Guzhva
e8b7575e93 AVX2 version of faiss::HNSW::MinimaxHeap::pop_min() (#2874)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2874

Reviewed By: mdouze

Differential Revision: D46125506

fbshipit-source-id: 4099e5c95bfb168b2097a42f5308c4bea1f72ca8
2023-05-26 11:35:21 -07:00
Matthijs Douze
6800ebef83 Support independent IVF coarse quantizer
Summary: In the IndexIVFIndepenentQuantizer, the coarse quantizer is applied on the input vectors, but the encoding is performed on a vector-transformed version of the database elements.

Reviewed By: alexanderguzhva

Differential Revision: D45950970

fbshipit-source-id: 30f6cf46d44174b1d99a12384b7d5e2d475c1f88
2023-05-26 02:59:01 -07:00
Alexandr Guzhva
a3296f42ad Use uint8_t instead of uint32_t for faiss::VisitedTable.visno (#2873)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2873

Reviewed By: mdouze

Differential Revision: D46125491

fbshipit-source-id: 9c48bb55e54eb361438521494093a3f9ab823857
2023-05-24 07:56:38 -07:00
Matthijs Douze
fd09e51316 move by_residual to IndexIVF (#2870)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2870

Factor by_residual for all the IndexIVF inheritors.
Some training code can be put in IndexIVF and `train_residual` is replaced with `train_encoder`.

This will be used for the IndependentQuantizer work.

Reviewed By: alexanderguzhva

Differential Revision: D45987304

fbshipit-source-id: 7310a687b556b2faa15a76456b1d9000e21b58ce
2023-05-23 09:56:19 -07:00
Gergely Szilvasy
1c1879b17c tiling bfKnn (#2865)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2865

Introduces a tiling version of `bfKnn` called `bfKnn_tiling`, which can break up both queries and vectors into tiles of size vectorsMemoryLimit and queriesMemoryLimit.

Reviewed By: wickedfoo

Differential Revision: D45944524

fbshipit-source-id: f9cd4c14dbf2d43def773124f19e92d25c86fc5a
2023-05-23 00:38:43 -07:00
generatedunixname89002005325676
5c221edf57 Daily arc lint --take CLANGFORMAT
Reviewed By: ivanmurashko

Differential Revision: D46063974

fbshipit-source-id: 949e60d45ebb0c0e4e59c1adcc41cd43a65086df
2023-05-22 02:14:50 -07:00
Matthijs Douze
a878c79db3 Support RAFT from python (#2864)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2864

Adds use_raft to the cloner options.
Adds tests for the python interface.

Also continue cleanup of data structures to set default arguments.
Add flags GPU and NVIDIA_RAFT to get_compile_options()

Reviewed By: algoriddle

Differential Revision: D45943372

fbshipit-source-id: 3428b24d309e9facfb4ebcf0d2d108dccfb4ad01
2023-05-19 20:49:01 -07:00
Matthijs Douze
48d48a37ac fix windows test (#2862)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2862

Fix windows test introduced by range search diff

Reviewed By: algoriddle

Differential Revision: D45901726

fbshipit-source-id: 16259b7718f1409adef814ea4c2b5707304849ca
2023-05-17 03:17:48 -07:00
generatedunixname89002005325676
615e3fca7f Daily arc lint --take CLANGFORMAT
Reviewed By: jhpowell

Differential Revision: D45908420

fbshipit-source-id: 84dfedc4af9dea3887e27e79b53414afd0c1790d
2023-05-16 07:40:22 -07:00