1432 Commits

Author SHA1 Message Date
CodemodService Bot
b122987939 Fix CQS signal. Id] 88153895 -- readability-redundant-string-init in fbcode/faiss
Reviewed By: dtolnay

Differential Revision: D72700675
2025-04-09 08:14:34 -07:00
Satyendra Mishra
7eac0346f5 Add normalize_l2 boolean to distributed training API
Summary: Add normalize_l2 boolean to distributed training API. This is just adding the field, implementation will come in the next diff

Reviewed By: junjieqi

Differential Revision: D72621956

fbshipit-source-id: 830794250837ff17e3adcd2f8f5c332601d2386f
2025-04-08 16:23:27 -07:00
Jaap Aarts
0dfb599eac Handle insufficient driver gracefully (#4271)
Summary:
Gracefully handle insufficient drivers (ex. no driver available.)

Resolves https://github.com/facebookresearch/faiss/issues/4251

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4271

Reviewed By: mnorris11

Differential Revision: D72351969

Pulled By: ramilbakhshyiev

fbshipit-source-id: de2b6f741087c59665e7f9f171ee6096c7eea39b
2025-04-03 00:27:37 -07:00
Alexandr Guzhva
d4e236b500 relax input params for IndexIVFRaBitQ::get_InvertedListScanner() (#4270)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4270

Reviewed By: mnorris11

Differential Revision: D72254929

Pulled By: junjieqi

fbshipit-source-id: 8354b58007d50d1daf06a3bfff4d2d05962c16af
2025-04-01 21:35:03 -07:00
Alexandr Guzhva
df9e2c48d6 Fix a placeholder for 'unimplemented' in mapped_io.cpp (#4268)
Summary:
This should fix a problem on macos compilation (just compilation), as discussed in https://github.com/facebookresearch/faiss/pull/4250#issuecomment-2767317033
mnorris11 please verify

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4268

Reviewed By: junjieqi

Differential Revision: D72215145

Pulled By: mnorris11

fbshipit-source-id: ccac8aedacaef330dbdc18888d16f870d008df0f
2025-04-01 21:29:26 -07:00
wwq
0d3aff9066 fix bug: IVFPQ of raft/cuvs does not require redundant check (#4241)
Summary:
The IVFPQ of raft/cuvs does not require pq length check for Faiss' original implementation. This check make IVFPQ support limited parameters than raft/cuvs in vain.
The check of supported PQ code length here
df6a8f6b4e/faiss/gpu/impl/IVFPQ.cu (L80-L102)

is for Faiss' original CUDA implementation. Raft/cuvs support more choices.

The wiki of faiss also describe the limitation (https://github.com/facebookresearch/faiss/wiki/Faiss-on-the-GPU#limitations), which needs to be update, too. Raft/cuvs is not limited to those choices.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4241

Reviewed By: bshethmeta, gtwang01

Differential Revision: D72200376

Pulled By: mnorris11

fbshipit-source-id: 2b81e822a397f3ab4a7c691e38be0186535d129d
2025-04-01 13:31:48 -07:00
Kaival Parikh
a4401c13d8 Allow using custom index readers and writers (#4180)
Summary:
### Description

- Create custom readers and writers for index IO, which take function pointers as input
- Also expose these from the C_API

This is helpful for FFI use, where calling processes would pass upcall stubs for streamlined IO

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4180

Reviewed By: gtwang01

Differential Revision: D71208266

Pulled By: mnorris11

fbshipit-source-id: ab82397d4780a2a07c7bfdc52329968377f42af4
2025-04-01 11:05:29 -07:00
Tarang Jain
636d95e8a4 Upgrade to libcuvs=25.04 (#4164)
Summary:
- [x] Upgrade cuVS version to 25.04 (nightly)
- [x] Update install docs; deprecate faiss-gpu-raft
- [x] CAGRA IVF-PQ Params as shared_ptr

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4164

Reviewed By: bshethmeta, gtwang01

Differential Revision: D72194928

Pulled By: mnorris11

fbshipit-source-id: ef5143760bebc2fcb2a3dc20ddc26b5d02a5c21d
2025-04-01 10:28:02 -07:00
Junjie Qi
7f523f0849 ignore regex (#4264)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4264

same as title

Reviewed By: bshethmeta

Differential Revision: D72179831

fbshipit-source-id: 9e77ef382312e843e68f388ee6360a6b26b032d4
2025-03-31 23:00:18 -07:00
Alexandr Guzhva
ccc2b33c88 fix a serialization problem in RaBitQ (#4261)
Summary:
it seems that 2937f94751 was not included in https://github.com/facebookresearch/faiss/pull/4235. This PR fixes this problem.

junjieqi

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4261

Reviewed By: gtwang01

Differential Revision: D72180816

Pulled By: junjieqi

fbshipit-source-id: 55e156a3499fda6f8cdbb99ed941a3cbdd721417
2025-03-31 22:00:35 -07:00
Kaival Parikh
13255a8bf0 Publish the C API to Conda (#4186)
Summary:
Addresses https://github.com/facebookresearch/faiss/issues/4181

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4186

Reviewed By: junjieqi

Differential Revision: D70328657

Pulled By: mnorris11

fbshipit-source-id: a8bda4f3342af557369b625e8ace13e9a1d92d65
2025-03-30 20:11:39 -07:00
Alexandr Guzhva
3a49130cec RaBitQ implementation (#4235)
Summary:
This is a reference implementation of the https://arxiv.org/pdf/2405.12497
> Jianyang Gao, Cheng Long, "RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search".

The goal is to correctly set up the internals using Faiss.

The following comments for the implementation:
* The code does not include the computations for the symmetric distance, because it is absent in the original article. This can be added later, though.
* The original `RaBitQ` includes random matrix rotation as a part of it, but I've decided to rely on external `faiss::IndexPreTransform` and `faiss::RandomRotationMatrix` facilities.
* Certain features required internal changes in `faiss::IndexIVF`, but I did that as least invasive as possible, without breaking the backward compatibility.
* Not sure about naming convensions, maybe certain classes and structures need to be renamed
* `METRIC_INNER_PRODUCT` is supported as well
* More unit tests are needed?
* I did not bring any hardware-specific optimizations, bcz this is a reference implementation. Certain `simdlib` facilities may be added later, if needed

Here's how to use IndexRaBitQ
```Python
        ds = datasets.SyntheticDataset(...)

        index_rbq = faiss.IndexRaBitQ(ds.d, faiss.METRIC_L2)
        index_rbq.qb = 8

        # wrap with random rotations
        rrot = faiss.RandomRotationMatrix(ds.d, ds.d)
        rrot.init(rrot_seed)

        index_cand = faiss.IndexPreTransform(rrot, index_rbq)
        index_cand.train(ds.get_train())
        index_cand.add(ds.get_database())
```

Here's how to use IndexIVFRaBitQ
```Python
        ds = datasets.SyntheticDataset(...)

        index_flat = faiss.IndexFlat(ds.d, faiss.METRIC_L2)
        index_rbq = faiss.IndexIVFRaBitQ(index_flat, ds.d, nlist, faiss.METRIC_L2)
        index_rbq.qb = 8

        # wrap with random rotations
        rrot = faiss.RandomRotationMatrix(ds.d, ds.d)
        rrot.init(rrot_seed)

        index_cand = faiss.IndexPreTransform(rrot, index_rbq)
        index_cand.train(ds.get_train())
        index_cand.add(ds.get_database())
```

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4235

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

buck run 'fbcode//mode/dev' fbcode//faiss/tests:test_rabitq

Reviewed By: mdouze

Differential Revision: D71638302

Pulled By: junjieqi

fbshipit-source-id: de981a6aed91d296237d8accf337359de04a552e
2025-03-29 12:26:39 -07:00
Satyendra Mishra
c2fc549085 Pass row filters to Hive Reader to filter rows (#4256)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4256

Pass row filters to Hive Reader to filter rows. This is needed for filtering for is_high_priority=true for Unicorn dataset

Reviewed By: junjieqi

Differential Revision: D71874955

fbshipit-source-id: b8ab4d9fbc8493b0da44ada66fa03339aacba9f6
2025-03-27 18:32:33 -07:00
Mayank Bhatia
6116d36af7 Grammar fix in FlatIndexHNSW (#4253)
Summary:
Changed "with with" to "with"

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4253

Reviewed By: gtwang01

Differential Revision: D71857100

Pulled By: junjieqi

fbshipit-source-id: 6c11e3767cb0d244707c889206de10169fccd6bf
2025-03-26 01:26:39 -07:00
Matthijs Douze
1debb7d812 re-land mmap diff (#4250)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4250

This is an attempt to re-land the diff stack D69972250  D70982449

It was reverted because the bottom of the stack did not pass the tests.

The original code comes from Alexandr Guzhva's  https://github.com/facebookresearch/faiss/pull/4199

To the adsmarket steward: the diff was already accepted by your team (see D70982449), but reverted for an independent reason. So should be easy to accept now.

Reviewed By: mengdilin

Differential Revision: D71614511

fbshipit-source-id: 94139b4a4d457afe0d37ac95342537414aa81e7a
2025-03-24 09:56:45 -07:00
Richard Barnes
0f2035cc83 Fix CUDA kernel index data type in faiss/gpu/impl/DistanceUtils.cuh +10 (#4246)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4246

CUDA kernel variables matching the type `(thread|block|grid).(Idx|Dim).(x|y|z)` [have the data type `uint`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#built-in-variables).

Many programmers mistakenly use implicit casts to turn these data types into `int`. In fact, the [CUDA Programming Guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/) it self is inconsistent and incorrect in its use of data types in programming examples.

The result of these implicit casts is that our kernels may give unexpected results when exposed to large datasets, i.e., those exceeding >~2B items.

While we now have linters in place to prevent simple mistakes (D71236150), our codebase has many problematic instances. This diff fixes some of them.

Reviewed By: dtolnay

Differential Revision: D71355340

fbshipit-source-id: 77dac270e1d3415bfe7d5cc214006d5176508474
2025-03-19 13:19:34 -07:00
Alexandr Guzhva
1dcbb4af32 fix IVFPQFastScan::RangeSearch() on the ARM architecture (#4247)
Summary:
the problem happens if `radius - normalizers[2 * q + 1]` is negative. Thus, it is possible to provide reasonable parameters to `IVFPQFastScan::RangeSearch()` and get an empty result.

I have no idea WHY (hardware implementation, it seems), but the following code
```C++
#include <cstddef>
#include <cstdint>
#include <iostream>

int main() {
    float f = -25.5f;
    uint16_t t = f;
    std::cout << t << std::endl;
    return 0;
}
```
prints `65511` on `x86` and `0` on ARM on the same compiler.

Thus, it is needed to wrap the `float` value with `int` to preserve a correct cast:
```C++
    uint16_t t = (int)f;
```

Who would have thought...

It is useful to find some C++ compiler command line flags that will generate a compilation error on such a behavior.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4247

Reviewed By: junjieqi, satymish

Differential Revision: D71427185

Pulled By: gtwang01

fbshipit-source-id: 3ff3a9d3bb523e48bb9512c380c042bb1c2decdb
2025-03-19 11:17:12 -07:00
Mengdi Lin
8bce244f1f fix integer overflow issue when calculating imbalance_factor (#4245)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4245

When number of clustering embeddings > int32 max, calculating imbalance factor from python side causes an function overload not found error.
```
[0]:[rank0]:     return faiss.imbalance_factor(len(assign), k, faiss.swig_ptr(assign))
[0]:[rank0]: NotImplementedError: Wrong number or type of arguments for overloaded function 'imbalance_factor'.
[0]:[rank0]:   Possible C/C++ prototypes are:
[0]:[rank0]:     faiss::imbalance_factor(int,int,int64_t const *)
[0]:[rank0]:     faiss::imbalance_factor(int,int const *)
```

Fixing it by changing the function signature in c++ land to support int64_t.

Reviewed By: bshethmeta

Differential Revision: D71130612

fbshipit-source-id: becbf464a9d3979269cc7f0cecc6b80a6f9e1199
2025-03-19 04:28:16 -07:00
Rohil Shah
5adab67efb Fix bug with metric_arg in IndexHNSW (#4239)
Summary:
Fix https://github.com/facebookresearch/faiss/issues/4224.

The issue is that `IndexHNSW`'s internal `Index* storage` doesn't inherit `metric_arg`. One solution is to set `metric_arg` in `IndexHNSW::add`, which is what I did. Not sure what the best place to do this would be.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4239

Reviewed By: mdouze

Differential Revision: D71225749

Pulled By: gtwang01

fbshipit-source-id: b27a592febadea153b575252df0c8ef14f7705d2
2025-03-18 23:49:28 -07:00
Mengdi Lin
f2f7a66b50 Back out "test merge with internal repo" (#4244)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4244

Original commit changeset: cd8a17e6527d

Original Phabricator Diff: D71327673

Reviewed By: junjieqi, gtwang01

Differential Revision: D71348755

fbshipit-source-id: 98058e81d3ba4e1d1614cc346d1b455d1de6e635
2025-03-17 19:16:13 -07:00
Junjie Qi
caa5f24656 test merge with internal repo (#4242)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4242

Reviewed By: bshethmeta

Differential Revision: D71327673

Pulled By: mengdilin

fbshipit-source-id: cd8a17e6527d245adc6689708f94e2932324adf5
2025-03-17 14:17:34 -07:00
Richard Barnes
9e808d4ea1 Remove unused exception parameter from faiss/impl/ResultHandler.h (#4243)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4243

`-Wunused-exception-parameter` has identified an unused exception parameter. This diff removes it.

This:
```
try {
    ...
} catch (exception& e) {
    // no use of e
}
```
should instead be written as
```
} catch (exception&) {
```

If the code compiles, this is safe to land.

Reviewed By: dtolnay

Differential Revision: D71290934

fbshipit-source-id: f5e47eed369a9a024cc1e16a23acafa49f75b651
2025-03-17 13:32:43 -07:00
Gustav von Zitzewitz
fec7ce96fb SearchParameters support for IndexBinaryFlat (#4055)
Summary:
Context issue: https://github.com/facebookresearch/faiss/issues/3503

We need search params support for binary flat index to be able to use it in RAG applications that support search with pre-filtering.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4055

Reviewed By: junjieqi

Differential Revision: D69538514

Pulled By: gtwang01

fbshipit-source-id: 4b6811fd8323b4c39e726b7fd33dfe0384dd57fc
2025-03-17 13:32:17 -07:00
George Wang
df6a8f6b4e Address compile errors and warnings (#4238)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4238

Nightly has been broken and PRs have been blocked: https://github.com/facebookresearch/faiss/actions/runs/13798181461/job/38595760879?pr=4055

There are compiler errors in Autotune.cpp and warnings in some other files that this diff seeks to address.

Reviewed By: r-barnes

Differential Revision: D71135388

fbshipit-source-id: b3daeff8c93dfb45144b266f3b0562164959710c
2025-03-13 16:20:56 -07:00
Saumya Agarwal
15491a1e4f Revert D69972250: Memory-mapping and Zero-copy deserializers
Differential Revision:
D69972250

Original commit changeset: 98a3f94d6884

Original Phabricator Diff: D69972250

fbshipit-source-id: 1bea8b8a26c14061a01f8b26b66f0c4e6a75f550
2025-03-11 11:43:17 -07:00
Saumya Agarwal
fbc7db2cce Revert D69984379: mem mapping and zero-copy python fixes
Differential Revision:
D69984379

Original commit changeset: 9437b4ad92ef

Original Phabricator Diff: D69984379

fbshipit-source-id: 3cb921fa79b6f20b6455b17e50acc3cb96bcbe7b
2025-03-11 11:43:17 -07:00
Matthijs Douze
631b0fde4f mem mapping and zero-copy python fixes (#4212)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4212

Add files to TARGETS
fix python

Reviewed By: mengdilin

Differential Revision: D69984379

fbshipit-source-id: 9437b4ad92ef49333a44ea37ec194364123fe825
2025-03-11 11:11:14 -07:00
Alexandr Guzhva
55a3c2aff4 Memory-mapping and Zero-copy deserializers (#4199)
Summary:
This PR introduces a backport of a combination of https://github.com/zilliztech/knowhere/pull/996 and https://github.com/zilliztech/knowhere/pull/1032 that allow to have memory-mapped and zerocopy indces.

The root underlying idea is that we replace certain `std::vector<>` containers with a custom `faiss::MaybeOwnedVector<>` container, which may behave either as `std::vector<>`, or as a view of a certain pointer / descriptor. We don't replace all the instances of `std::vector<>`, but the largest ones.

This change affects `IndexFlatCodes`-based and `IndexHNSW` CPU indices.

(done) alter IVF lists as well.
(done) alter binary indices as well.

Memory-mapped index works like this:
```C++
std::unique_ptr<faiss::Index> index_mm(
            faiss::read_index(filenamename.c_str(), faiss::IO_FLAG_MMAP_IFC));
```
In theory, it should be ready to be used from Python. All the descriptor management should be working.

Zero-copy index works like this:
```C++
#include <faiss/impl/zerocopy_io.h>

faiss::ZeroCopyIOReader reader(buffer.data(), buffer.size());
std::unique_ptr<faiss::Index> index_zc(faiss::read_index(&reader));
```
All the pointer management for `faiss::ZeroCopyIOReader` should be handled manually.
I'm not sure how to plug this into Python yet, maybe, some ref-counting is required.

(done) some refactoring

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4199

Reviewed By: mengdilin

Differential Revision: D69972250

Pulled By: mdouze

fbshipit-source-id: 98a3f94d6884814873d3534ee25f960892ef1076
2025-03-11 11:11:14 -07:00
Richard Barnes
653be59386 Use nullptr in faiss/gpu/StandardGpuResources.cpp (#4232)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4232

`nullptr` is preferable to `0` or `NULL`. Let's use it everywhere so we can enable `-Wzero-as-null-pointer-constant`.

 - If you approve of this diff, please use the "Accept & Ship" button :-)

Reviewed By: dtolnay

Differential Revision: D70818157

fbshipit-source-id: a46d64b6d80844f5246f7df236eb6ec54ce2886f
2025-03-09 11:24:20 -07:00
Lucian Grijincu
3d96ad56a4 faiss: fix non-templated hammings function (#4195)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4195

Non-templated `hammings` call produced incorrect values.

`hammings` is called from `hamming_distance_table`, which in turn is unused so no impact.
https://www.internalfb.com/code/fbsource/[85684614381d9bdfaaa0bb4a42e244296e350848]/fbcode/faiss/IndexPQ.cpp?lines=439-446

Reviewed By: gtwang01

Differential Revision: D69613329

fbshipit-source-id: 5d02a99b04492a61ebf0134f0c1719eac86fbb4f
2025-03-07 11:38:20 -08:00
Junjie Qi
4cd2f6e007 Support non-partition col and map in the embedding reader (#4229)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4229

same as title

Differential Revision: D70728870

fbshipit-source-id: aeb817d80b20e5671c81ba88cdd05797cb070d23
2025-03-06 19:01:59 -08:00
Junjie Qi
a22ec32dd3 Support cosine distance for training vectors (#4227)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4227

same as title

Differential Revision: D70724590

fbshipit-source-id: 943648d9002b38ba967c254c8c7014fdc7ab3de8
2025-03-06 18:07:21 -08:00
Richard Barnes
c109174198 Fix LLVM-19 compilation issue in faiss/AutoTune.cpp (#4220)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4220

LLVM-19 is incoming. This fixes an issue preventing it. Delays to previous platform upgrades cost $3M/week.

Reviewed By: dtolnay

Differential Revision: D70449926

fbshipit-source-id: 20e0882b9363670d6c010e1c7870cb04155a3a9d
2025-03-02 21:20:51 -08:00
Shuyao Qi
615c17ea27 Add missing #include in code_distance-sve.h (#4219)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4219

`code_distance-sve.h` references `PQDecoder8` but doesn't include the header. The issue is revealed by D68784260 which removed some includes from a header that indirectly included `ProductQuantizer.h`

```
headers/faiss/impl/code_distance/code_distance-sve.h:74:45: error: unknown type name 'PQDecoder8'; did you mean 'PQDecoderT'?
       74 | std::enable_if_t<std::is_same_v<PQDecoderT, PQDecoder8>, float> inline distance_single_code_sve(
          |                                             ^~~~~~~~~~
          |                                             PQDecoderT
```

Reviewed By: ddrcoder

Differential Revision: D70433576

fbshipit-source-id: 12945b16003a3d6a995b18ffe9798179ecf826f4
2025-02-28 22:09:17 -08:00
Tom Jackson
eab52af8ea Fix cloning and reverse index factory for NSG indices (#4151)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4151

Reviewed By: junjieqi, asadoughi

Differential Revision: D68784260

fbshipit-source-id: a715b02fd18a59c393be3ccc9aa1a7be8b196cc8
2025-02-28 15:13:56 -08:00
George Wang
1a295cd544 Remove python_abi to fix nightly (#4217)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4217

liblief seemed to be causing issues in nightly: https://github.com/facebookresearch/faiss/actions/runs/13560151536/job/37908376889
Removing the pin while pinning conda-build resolves the issue.

Reviewed By: mnorris11

Differential Revision: D70344910

fbshipit-source-id: c19bfcf187714fbe36e549bfb007eb9787a011b6
2025-02-27 16:20:26 -08:00
Shuyao Qi
4cea80b41c Make static method in header inline (#4214)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4214

Got build failure with flags `[-Werror,-Wunneeded-internal-declaration]`
```
faiss/impl/code_distance/code_distance-sve.h:199:13
error: 'static' function 'distance_four_codes_sve_for_small_m' declared in header file should be declared 'static inline' [-Werror,-Wunneeded-internal-declaration]
```

Reviewed By: vit-ka

Differential Revision: D70279069

fbshipit-source-id: 28b5cc8394a9a508e25f72777f74de685d242dc4
2025-02-26 22:09:54 -08:00
Michael Norris
835b3ea1bd Fix IVF quantizer centroid sharding so IDs are generated (#4197)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4197

Ivan and I discussed 2 problems:
1. We may want to try to offload/shard PQ or SQ table data if there is a big enough win (pending)
2. IDs seem to be random after sharding.

This diff solves 2.

Root cause is that we add to quantizer without IDs.

Instead, we wrap in IndexIDMap2 (which provides reconstruction, whereas IndexIDMap does not).

Laser's quantizers are Flat and HNSW, so we can wrap like this.

Reviewed By: ivansopin

Differential Revision: D69832788

fbshipit-source-id: 331b6d1cf52666f5dac61e2b52302d46b0a83708
2025-02-24 16:01:08 -08:00
Michael Norris
65222b3ed7 Pin lief to fix nightly (#4211)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4211

Reviewed By: gtwang01

Differential Revision: D70102429

fbshipit-source-id: 68e265699448a825b82467064ca95742bd4e49c3
2025-02-24 12:46:51 -08:00
lkuffo
7cb4556456 Fix Sapphire Rapids never loading in Python bindings (#4209)
Summary:
If both `avx512` and `avx512_spr` are compiled, Sapphire Rapids capabilities are never loaded when using the Python bindings, as the `avx512` import always overrides the `avx512_spr` one.

This very small PR solves the issue.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4209

Reviewed By: mengdilin

Differential Revision: D70015045

Pulled By: gtwang01

fbshipit-source-id: d3553a6c9048a534c0901ee29e7e2354de96e79f
2025-02-21 22:38:30 -08:00
Michael Norris
20c7ca35bb Upgrade openblas to 0.3.29 for ARM architectures (#4203)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4203

Related to issue: https://github.com/facebookresearch/faiss/issues/4202

Reviewed By: mengdilin

Differential Revision: D69933126

fbshipit-source-id: cafc5f34d0f91450c5067827756b1297684b0ce3
2025-02-21 17:46:25 -08:00
George Wang
55d022fbb0 Attempt to nightly fix (#4204)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4204

Fix for S492386

I found a slight difference between failing nightly: https://github.com/facebookresearch/faiss/actions/runs/13429138293/job/37523589618
And last succeeding nightly: https://github.com/facebookresearch/faiss/actions/runs/13301645334/job/37182266030

The mkl package in the last succeeding nightly is 2023.0.0, and it is 2023.2.0 in the failing nightly. Since mkl was recently causing trouble, I pin mkl to 2023.0.0 in this diff to match the las succeeding nightly.

Reviewed By: mnorris11

Differential Revision: D69937976

fbshipit-source-id: 0c4aba4322e26aa6a03bf3ea1dbee6ed7049092c
2025-02-21 02:37:15 -08:00
Navneet Verma
00ce0e2189 Add the support for IndexIDMap with Cagra index (#4188)
Summary:
## Description
Add the support for adding vectors with ids when IndexIDMap is used with Cagra Index.

Resolves issue: https://github.com/facebookresearch/faiss/issues/4107

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4188

Reviewed By: mnorris11

Differential Revision: D69812544

Pulled By: gtwang01

fbshipit-source-id: 3c12c930e5d10ce214b12e68dacd63a644011b79
2025-02-21 00:32:05 -08:00
Nicolas De Carli
1fe8b8b5f1 Remove unused variable (#4205)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4205

Removing unused variable.

This piece of code began to be compiled after armv9a has been set as default compilation profile

Reviewed By: andrewjcg

Differential Revision: D69946389

fbshipit-source-id: f2b5e57585506eb7cecbf76bf71bc6a2b5cc7133
2025-02-20 20:54:22 -08:00
Divye Gala
6b652892ff Pass store_dataset argument along to cuVS CAGRA (#4173)
Summary:
This is required to enable lazy setting of a device copy of the training dataset to a cuVS CAGRA index.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4173

Reviewed By: mnorris11

Differential Revision: D69795662

Pulled By: gtwang01

fbshipit-source-id: 68cda198ed7983800b64d3e5fac1b77ff55ecd12
2025-02-20 19:30:30 -08:00
Michael Norris
d72d0cab6b Fix nightly by installing earlier version of lief (#4198)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4198

1. pins lief due to `AttributeError: type object 'CLASS' has no attribute 'CLASS64'` (just set it to last passing nightly version)
2. pins mkl in gpu builds due to it trying to pull in 2024.2.2 which conflicts with 2023 in the libfaiss.

Added nightlies to make sure they pass https://github.com/facebookresearch/faiss/actions/runs/13422430425/job/37498020894. Not all passed: I'm not sure the `build-pull-request / Linux x86_64 GPU w/ cuVS nightlies (CUDA 12.4.0)` nightly is actually broken, but this unblocks the PR builds for now.

Reviewed By: junjieqi

Differential Revision: D69860604

fbshipit-source-id: 2da623c71b03c22d581b78655253a863fbafd3ed
2025-02-19 21:44:03 -08:00
Bhavik Sheth
657c563604 Add bounds checking to hnsw nb_neighbors (#4185)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4185

Based on this users comment it seems like we should do bound checking: https://github.com/facebookresearch/faiss/issues/4177

Reviewed By: mnorris11

Differential Revision: D69497295

fbshipit-source-id: 97025cf29c464afb0f85aa98f4b303489b7fc989
2025-02-14 10:39:11 -08:00
George Wang
f0e3832986 Check for not completed
Summary:
Check for not completed rather than just in_progress, as runs can be queued, waiting, etc.
Fix due to failed nightly not retrying because retry build found it was "queued" instead of "in_progress"

Failed nightly: https://github.com/facebookresearch/faiss/actions/runs/13301645334/attempts/1
Retry that didn't trigger: https://github.com/facebookresearch/faiss/actions/runs/13301647044/job/37144032841

Reviewed By: mengdilin

Differential Revision: D69610422

fbshipit-source-id: a7a9b998bba160e8d1ba13c7ae2426d99125a7e8
2025-02-13 15:49:45 -08:00
Michael Norris
aff6bfcd80 Add sharding convenience function for IVF indexes (#4150)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4150

Creates a sharding convenience function for IVF indexes.
- The __**centroids on the quantizer**__ are sharded based on the given sharding function. (not the data, as data sharding by ids is already implemented by copy_subuset_to, https://github.com/facebookresearch/faiss/blob/main/faiss/IndexIVF.h#L408)
- The output is written to files based on the template filename generator param.
- The default sharding function is simply the ith vector mod the total shard count.

This would called by Laser here: https://www.internalfb.com/code/fbsource/[ce1f2e028e79]/fbcode/fblearner/flow/projects/laser/laser_sim_search/knn_trainer.py?lines=295-296. This convenience function will do the file writing, and return the created file names.

There's a few key required changes in FAISS:
1. Allow `std::vector<std::string>` to be used. Updates swigfaiss.swig and array_conversions.py to accommodate. These have to be numpy dtype of `object` instead of the more correct `unicode`, because unicode dtype is fixed length. I couldn't figure out how to create a numpy array with each of the output file names where they have different dtypes. (Say the file names are like file1, file11, file111. The dtype would need to be U5, U6, U7 respectively, as the dtype for unicode contains the length). I tried structured arrays : this does not work either, as numpy makes it into a matrix instead: the `file1 file11 file111` example with explicit setting of U5, U6, U7 turns into `[[file1 file1 file1], [file1 file11 file11], [file1 file11 file111]]`, which we do not want. If someone knows the right syntax, please yell at me
2. Create Python callbacks for sharding and template filename: `PyCallbackFilenameTemplateGenerator` and `PyCallbackShardingFunction`. Users of this function would inherit from the FilenameTemplateGenerator or ShardingFunction in C++ to pass to `shard_ivf_index_centroids`. See the other examples in python_callbacks.cpp. This is required because Python functions cannot be passed through SWIG to C++ (i.e. no std::function or function pointers), so we have to use this approach. This approach allows it to be called from both C++ and Python. test_sharding.py shows the Python calling, test_utils.cpp shows the C++ calling.

Reviewed By: asadoughi

Differential Revision: D68534991

fbshipit-source-id: b857e20c6cc4249a2ab7792db4c93dd4fb8403fd
2025-02-07 11:39:59 -08:00
Kaival Parikh
1d8f3931a3 Handle plain SearchParameters in HNSW searches (#4167)
Summary:
Add ability to search HNSW indexes using a plain [`SearchParameters`](6c046992a7/faiss/Index.h (L64-L69)) object (i.e. only an [`IDSelector`](6c046992a7/faiss/Index.h (L66)))

Issue: Currently if a plain `SearchParameters` is used to query an HNSW index, [an error is thrown](6c046992a7/faiss/IndexHNSW.cpp (L251)) -- when the user's intent was only to filter some documents, and rely on index settings for remaining parameters (like `efSearch`, `check_relative_distance`, `search_bounded_queue`)

Motivation: Faiss provides an amazing [index factory](https://github.com/facebookresearch/faiss/wiki/The-index-factory) and [parameter setter](https://github.com/facebookresearch/faiss/wiki/Index-IO,-cloning-and-hyper-parameter-tuning) to abstract away internals of the index type and settings used, like:
```cpp
Index* index = index_factory(256, "HNSW32");
ParameterSpace().set_index_parameters(index, "efConstruction=200,efSearch=150");
```

Now if a user wants to perform a filtered search on this _opaque_ index using:
```cpp
SearchParameters parameters;
parameters.sel = new IDSelectorRange(10, 20);
index->search(nq, xq, k, d, id, &parameters);
```

they are met with an error:
```
faiss/IndexHNSW.cpp:251: Error: '!(params)' failed: params type invalid
```

An easy way to reproduce this issue is to replace `Flat` -> `HNSW` [here](6c046992a7/c_api/example_c.c (L60)) and run `example_c` like:
```
make -C build example_c
./build/c_api/example_c
```

This PR allows passing a plain `SearchParameters` to HNSW indexes, and use index settings as a fallback

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4167

Reviewed By: asadoughi

Differential Revision: D69312175

Pulled By: mnorris11

fbshipit-source-id: 63cc1deb6cb6116850cb3f8f7866eaa3a911ee48
2025-02-07 11:39:49 -08:00