727 Commits

Author SHA1 Message Date
Greg Sinclair
59dc1d31cd Add sa_code_size, sa_encode, and sa_decode to the C_API (#2367)
Summary:
Exporting a few more functions to the C API

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2367

Reviewed By: alexanderguzhva

Differential Revision: D37480505

Pulled By: mdouze

fbshipit-source-id: 899baca8795e29b20e16b56ea3c0d13960e1ea37
2022-06-29 12:02:51 -07:00
Alexandr Guzhva
fb8193d151 Add sa_decode() to IndexIVFAdditiveQuantizer (#2362)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2362

Reviewed By: mdouze

Differential Revision: D37280450

fbshipit-source-id: 610b553e8219df9a9f52442ffc3942036f47284a
2022-06-20 10:54:11 -07:00
Alexandr Guzhva
3986ebffca fast C++ templates for sa_decode (#2354)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2354

A specialized code that provides 2x-3x faster Index::sa_decode for
* IVF256,PQ[1]x8np
* Residual[1]x8,PQ[2]x8

Reviewed By: mdouze

Differential Revision: D37092134

fbshipit-source-id: d848b6cf1aefa826a5ca01e41935aa5d46f5dcc7
2022-06-16 09:20:19 -07:00
Patrick Somaru
578fbc9a8e faiss 6bit benchmark config (#2329)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2329

Reviewed By: beauby

Differential Revision: D36003967

fbshipit-source-id: 1167d028477ab6f42fe8d3cfd2f198c274c0fe9a
2022-05-17 05:19:54 -07:00
Alexandr Guzhva
30e8561476 expose certain buffer sizes (#2327)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2327

Expose buffer sizes for:
* MultiIndexQuantizer::search
* IndexIVFPQ::add_core_o
* Index2Layer::sa_encode
* ProductQuantizer::compute_codes

These constants were introduced to handle the possible out-of-memory problem. Faiss performs certain operations in chunks. Increasing the chunk sizes reduces the OpenMP overhead and speeds up computations in certain cases at the cost of higher memory consumption.

Reviewed By: mdouze

Differential Revision: D36248391

fbshipit-source-id: 17b38f8b7f59748d5ff72c79938e66b1800983a9
2022-05-10 12:21:55 -07:00
Check Deng
9b1982262a Add ProductAdditiveQuantizer (#2286)
Summary:
This diff added ProductAdditiveQuantizer.

A Simple Algo description:

1. Divide the vector space into several orthogonal sub-spaces, just like PQ does.
2. Quantize each sub-space by an independent additive quantizer.

Usage:

Construct a ProductAdditiveQuantizer object:
- `d`: dimensionality of the input vectors
- `nsplits`: number of sub-spaces divided into
- `Msub`: `M` of each additive quantizer
- `nbits`: `nbits` of each additive quantizer

```python
d = 128
nsplits = 2
Msub = 4
nbits = 8
plsq = faiss.ProductLocalSearchQuantizer(d, nsplits, Msub, nbits)
prq = faiss.ProductResidualQuantizer(d, nsplits, Msub, nbits)
```

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2286

Test Plan:
```

buck test //faiss/tests/:test_local_search_quantizer -- TestProductLocalSearchQuantizer

buck test //faiss/tests/:test_residual_quantizer -- TestProductResidualQuantizer

```

Reviewed By: alexanderguzhva

Differential Revision: D35907702

Pulled By: mdouze

fbshipit-source-id: 7428a196e6bd323569caa585c57281dd70e547b1
2022-05-05 15:14:07 -07:00
Matthijs Douze
88f550deee Fix failures surfaced by platform010 migration (#2320)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2320

Checks are a bit stricted in platform010, so fix new CI errors here.
The errors corrected fall in 3 classes:

- `&vector[vector.size()]` now fails because `operator []` checks for array bounds even if only the address is maniuplated

- `omp schedule(dynamic)` does not run the loop in the correct order.

- several threads calling omp loop seems to cause errors in the distributed Faiss code

Reviewed By: beauby

Differential Revision: D35895550

fbshipit-source-id: e9dcf5615158610a42870e6a41c77e4db6ebeea0
2022-05-05 13:50:08 -07:00
Richard Barnes
c70c87e62f Fix vector overrun access in kmeans1d.cpp to unblock faiss platform010 migration
Summary:
Fixes:
```
third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/stl_vector.h:1045: std::vector::reference std::vector<float>::operator[](std::vector::size_type) [_Tp = float, _Alloc = std::allocator<float>]: Assertion '__n < this->size()' failed.
```

Reviewed By: luciang

Differential Revision: D36045522

fbshipit-source-id: 1c447ad0c5461ef1cf1db06587ed1d870a0dc80f
2022-04-30 09:16:06 -07:00
Richard Barnes
162c1aa015 Fix vector overrun in IndexFlatCodes.cpp for platform010 migration of faiss
Summary:
Fixes
```
test_add_0_vecs (faiss.tests.test_index.TestHNSW) ... third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/stl_vector.h:1045: std::vector::reference std::vector<unsigned char>::operator[](std::vector::size_type) [_Tp = unsigned char, _Alloc = std::allocator<unsigned char>]: Assertion '__n < this->size()' failed.
```

Reviewed By: luciang

Differential Revision: D36045532

fbshipit-source-id: db6e2724e17913e4f86cc78e99d721ab0615f43f
2022-04-30 09:15:53 -07:00
CodemodService Bot
02c8452eb9 fbcode//faiss/tests
Reviewed By: aijanai

Differential Revision: D35890159

fbshipit-source-id: a7962b36ae6879543c5f950ef73a619343541328
2022-04-25 17:55:46 -07:00
Matthijs Douze
d77ebcaca9 fix includes in GPU demo
Summary:
Fixed the include file for the IVFPQ demo in the GPU index. Adds a targets entry for it as well.
Fixes
https://github.com/facebookresearch/faiss/issues/2293

Reviewed By: beauby

Differential Revision: D35775928

fbshipit-source-id: 15ea837e5a67a6d692e980d90195400936dac1e1
2022-04-20 03:44:09 -07:00
Matthijs Douze
f68ddd0564 fix test in test_contrib (#2294)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2294

there is a weird CI failure on one of the platforms occurring in the PR

https://github.com/facebookresearch/faiss/pull/2291

This diff makes the test a bit more robust, correcting inter_perf to computer the intersection measure. Hopefully this will make the bug go away.

Reviewed By: beauby

Differential Revision: D35558855

fbshipit-source-id: f5a926d9d8ebee975e538c65ac37b15d485798aa
2022-04-20 03:03:38 -07:00
spectaclehong
b13f47a4da Fix reconstruct bug when by_residual is false (#2298)
Summary:
When I reconstruct with by_residual turned off, the distance was greatly increased.
This is because the reconstruct_from_offset function did not check if the by_residual option was off.
I fix this bug with simple if statement.
(like this https://github.com/facebookresearch/faiss/blob/main/faiss/IndexIVFPQ.cpp#L365)

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2298

Reviewed By: alexanderguzhva

Differential Revision: D35746566

Pulled By: mdouze

fbshipit-source-id: 50f98c7cc97c7936507573fe41b65a79ecdbc4ca
2022-04-20 01:35:21 -07:00
Matthijs Douze
8ffed8c219 common ancestor for quantizer classes (#2295)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2295

Makes a common ancestor for quantizer classes.
As a first application, adds a clone_Quantizer function

Reviewed By: alexanderguzhva

Differential Revision: D35561960

fbshipit-source-id: 896a4f3fc4ab992511cdc0642689a440f170f683
2022-04-20 01:34:01 -07:00
Check Deng
992d494d4a Support reconstruct() in IndexFastScan (#2287)
Summary:
This diff implemented `reconstruct()` interface in `IndexFastScan`.

unit test: `python -m unittest test_fast_scan_ivf.TestReconstruct`

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2287

Test Plan: buck test //faiss/tests/:test_fast_scan_ivf -- TestReconstruct

Reviewed By: alexanderguzhva

Differential Revision: D35576662

Pulled By: mdouze

fbshipit-source-id: ac584128cd370807182dee06a57efedc23c0f7d4
2022-04-19 03:18:58 -07:00
Alexandr Guzhva
536aa99da7 speedup exhaustive_L2sqr_blas when a single nearest point needed (#2296)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2296

Reviewed By: mdouze

Differential Revision: D35602161

fbshipit-source-id: 5c9bb8a9444ab111708328d60d1080802756bb02
2022-04-13 13:39:15 -07:00
Lucas Hosseini
7c9d979d66 Enable servicelab regression testing.
Summary:
Start migration of existing benchmarks to Google's Benchmark library + register benchmark to servicelab.

The benchmark should be automatically registered to servicelab once this diff lands according to https://www.internalfb.com/intern/wiki/ServiceLab/Use_Cases/Benchmarks_(C++)/#servicelab-job.

Reviewed By: mdouze

Differential Revision: D35397782

fbshipit-source-id: 317db2527f12ddde0631cacc3085c634afdd0e37
2022-04-07 02:45:55 -07:00
Matthijs Douze
bb4c987b5c Demo of residual quantizer distance computer for LaserKNN (#2283)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2283

This is a demonstration for:

- how to use a distance computer to compute query-to-code distances with a residual quantizer

- how to construct a ResidualCoarseQuantizer that uses a prefix of residalquantizer codes

See related doc https://docs.google.com/document/d/1g97lrMXVYh5FcQzw23v_sUE22ybHfCFxtbHyFJwxKKE/edit?usp=sharing

Reviewed By: alexanderguzhva

Differential Revision: D34958088

fbshipit-source-id: edb06ee350de67f855e96ae57a3862fbf14f6e54
2022-04-06 12:42:24 -07:00
Matthijs Douze
1806c6af27 Automatic type conversions for Python API (#2274)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2274

All input matrices needed to be of the correct type and to be C-contiguous. This diff passes the main entry points of the api through `np.ascontiguousarray` so that the function parameters are transparently converted to the suitable format if needed.

We did not have this before because users need to be made aware of the performance impact, but it seems that maybe usability is more useful.

This diff is an alternative to

D35007365
https://github.com/facebookresearch/faiss/pull/2250

Reviewed By: beauby

Differential Revision: D35009612

fbshipit-source-id: fa0d5cfdfbff6b0916d47bd33c620e3ca9d5dd40
2022-03-30 05:42:08 -07:00
Alexandr Guzhva
b32abc95c2 ProductQuantizer::compute_code tracks the nearest vector index in a register rather than stores the distances in a buffer. (#2280)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2280

Add a new function call fvec_L2sqr_ny_nearest and a demonstration of its implementation for 4 bits

Reviewed By: mdouze

Differential Revision: D35189945

fbshipit-source-id: d1b2ba42851df195123c7e318a8dcf26f775eaba
2022-03-29 10:21:23 -07:00
Alexandr Guzhva
438b64cd8b IVFPQ AVX2 optimization for PQ, including polysemous filtering (#2277)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2277

* extend a specialized AVX2 version for IVFPQScannerT::scan_list_with_table to cover  IVFPQScannerT::scan_list_polysemous_hc as well
* lower the comparison precision in test_lowlevel_ivf tests from EXPECT_EQ to EXPECT-FLOAT_EQ because of the AVX2 change in IVFPQScannerT::scan_list_polysemous_hc, otherwise tests fail

Reviewed By: mdouze

Differential Revision: D34964138

fbshipit-source-id: 1d304a8f6eda040fa4c626676b4d492f2c12f04f
2022-03-24 06:35:38 -07:00
Matthijs Douze
291353c5a9 Generalize DistanceComputer for flat indexes (#2255)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2255

The `DistanceComputer` object is derived from an Index (obtained with `get_distance_computer()`). It maintains a current query and quickly computes distances from that query to any item in the database. This is useful, eg. for the IndexHNSW and IndexNSG that rely on query-to-point comparisons in the datasets.

This diff introduces the `FlatCodesDistanceComputer`, that inherits from `DistanceComputer` for Flat indexes. In addition to the distance-to-item function, it adds a `distance_to_code` that computes the distance from any code to the current query, even if it is not stored in the index.

This is implemented for all FlatCode indexes (IndexFlat, IndexPQ, IndexScalarQuantizer and IndexAdditiveQuantizer).

In the process, the two classes were extracted to their own header file `impl/DistanceComputer.h`

Reviewed By: beauby

Differential Revision: D34863609

fbshipit-source-id: 39d8c66475e55c3223c4a6a210827aa48bca292d
2022-03-20 23:43:33 -07:00
Matthijs Douze
add3705c11 make fast scan tests cheaper (#2251)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2251

the fast_scan and fast_scan_ivf tests are irregularly timing out on the FB test infra

This diff:
- breaks down more tests into sub-tests
- makes tests cheaper by reducing the test dataset sizes
- corrects a nasty local variable binding bug that prevented all cases of `implem` to be covered.

I also tried to fix the polysemous tests that also timeout but I could not reproduce the timeout.

https://www.internalfb.com/intern/test/562949978542309?ref_report_id=0

Reviewed By: beauby

Differential Revision: D34852254

fbshipit-source-id: b005ffb3723e7d9df75516a539540d9165249cea
2022-03-16 13:23:07 -07:00
Carter McClellan
e52101689f Summary: Add Binary Index i.o to c_api (#2233)
Summary:
Add Binary Index i.o to c_api

In reference to isse: https://github.com/facebookresearch/faiss/issues/2225

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2233

Test Plan: Only changes to the C API

Reviewed By: beauby

Differential Revision: D34815076

Pulled By: mdouze

fbshipit-source-id: 70e0bb15ff02044995460c274bd50465b44ccc5b
2022-03-16 03:28:46 -07:00
Alexandr Guzhva
8f2a72a8e6 AVX2 optimized IVFPQ scanning code (#2253)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2253

add a specialized AVX2 version for IVFPQScannerT::scan_list_with_table

Reviewed By: mdouze

Differential Revision: D34733503

fbshipit-source-id: a428de04548426b39bc5a092b9f6802eadbd184d
2022-03-15 17:35:11 -07:00
Alexandr Guzhva
80bf6a2bc6 Improve AVX2 distance computations (#2252)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2252

Improve fvec_op_ny_D4 for AVX2.

Reviewed By: mdouze

Differential Revision: D34729914

fbshipit-source-id: ac51eb3804a9df9c0ec56963f477ddc70e44d8fa
2022-03-15 16:51:04 -07:00
Alexandr Guzhva
b2712c2a40 AVX2 implementation of fvec_madd (#2254)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2254

AVX2 version of fmadd_vec.

Reviewed By: mdouze

Differential Revision: D34764024

fbshipit-source-id: 66f694df0302c79f2c295ee10dcc7d564faa4a26
2022-03-15 16:46:19 -07:00
Jeff Johnson
31b0565e16 Fix GPU IVF serialization bug; fix stream issues (#2263)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2263

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2258

GPU IVF indices could not be properly deserialized or copied from CPU then added to after the fact, this resulted in the following C++ assertion:

```
Faiss assertion 'indices->numVecs == oldNumVecs' failed in int faiss::gpu::IVFBase::addVectors(faiss::gpu::Tensor<float, 2, true>&, faiss::gpu::Tensor<long int, 1, true>&) at faiss/gpu/impl/IVFBase.cu:581
```

as the count of vectors present was not updated properly everywhere, as discovered internally by vtantia.

This diff fixes this issue by properly updating the count, as well as cleaning up stream usage in the IVF code. The problem is that the code was previously using `thrust::device_vector` which does not have a means to control on which stream copies or other work is performed. This is fixed by replacing all usage of `thrust::device_vector` with our own `DeviceVector` which was already used to store IVF data but not metadata. `DeviceVector` provides sufficient control over the proper CUDA stream usage.

Reviewed By: vtantia, mdouze

Differential Revision: D34886859

fbshipit-source-id: 70577bb386ff7dc0f4443ec4562d3ee80afc24e3
2022-03-15 13:38:10 -07:00
Ivan Sopin
d50211a38f Break distance ties in heap_replace_top() by ID (#2245)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2245

This changeset makes the `heap_replace_top()` function of the FAISS heap implementation break distance ties by the element's ID, according to the heap's min/max property.

Reviewed By: mdouze

Differential Revision: D34669542

fbshipit-source-id: 0db24fd12442eedeee917fbb3e811ba4a070ce0f
2022-03-09 10:23:48 -08:00
Jeff Johnson
878275f915 Faiss GPU: remove d2h copy from PQ MM debugging
Summary:
If the number of dimensions per sub-quantizer is not in the specialized list, it falls back to the generalized batch GEMM implementation.

When I implemented this, I had in a d2h copy so I could look at the computed distances. I removed the debugging code but not this copy.

Prior to this, PQ16 on 1024 dims was 6x slower than PQ32. Now, it is only 1.5x slower (it is slower because there is a higher number of dims per sub-q, despite there being more sub-qs).

Reviewed By: beauby

Differential Revision: D34526043

fbshipit-source-id: de6f70f0f0b91608eb6ae2a05da2af812546e4bc
2022-02-28 15:43:33 -08:00
Matthijs Douze
b8fe92dfee contrib clustering module (#2217)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2217

This diff introduces a new Faiss contrib module that contains:
- generic k-means implemented in python (was in distributed_ondisk)
- the two-level clustering code, including a simple function that runs it on a Faiss IVF index.
- sparse clustering code (new)

The main idea is that that code is often re-used so better have it in contrib.

Reviewed By: beauby

Differential Revision: D34170932

fbshipit-source-id: cc297cc56d241b5ef421500ed410d8e2be0f1b77
2022-02-28 14:18:47 -08:00
CodemodService FBSourceClangFormatLinterBot
647135ec47 Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D34412981

fbshipit-source-id: a7aa81c0c69bf731db37813f431d9f6ed6a6a355
2022-02-23 02:24:32 -08:00
Check Deng
41007232d6 AQ fastscan (#2169)
Summary:
Work in progress.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2169

Test Plan:
buck test mode/opt //faiss/tests/:test_fast_scan
buck test mode/opt //faiss/tests/:test_fast_scan_ivf

Reviewed By: beauby

Differential Revision: D34208813

Pulled By: mdouze

fbshipit-source-id: 74b72e07dc537667a7def403c4e46d3d05408c27
2022-02-22 15:24:31 -08:00
Check Deng
a03a1eba8b Add IndexNSGPQ and IndexNSGSQ (#2218)
Summary:
This diff added IndexNSGPQ and IndexNSGSQ, including index factory and I/O. And also fixed the ARM CI.

Fixed https://github.com/facebookresearch/faiss/issues/2128

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2218

Reviewed By: beauby

Differential Revision: D34276313

Pulled By: mdouze

fbshipit-source-id: a5014af8447800ad15bd89b4f87204b4b36866d2
2022-02-18 04:51:15 -08:00
Matthijs Douze
06ae6b8a59 Refresh github README (#2219)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2219

Update the readme and add KingLittleQ as an author

Reviewed By: beauby

Differential Revision: D34195947

fbshipit-source-id: f99a01f825e17ceba960063d10a9d93e324336fb
2022-02-14 02:01:20 -08:00
Matthijs Douze
eb8781557f Fix exhaustive search GT computation with IP distance (#2212)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2212

Fixes issue

https://github.com/facebookresearch/faiss/issues/2205

clear bug report
easy fix
easy to accept ;-)

Reviewed By: beauby

Differential Revision: D33975281

fbshipit-source-id: 088e1f3078dc79402563be7fac3530d76b197006
2022-02-07 19:36:21 -08:00
avk
36f2998a64 Allow to tune efConstruction HNSW parameter with the ParameterSpace object (#2160)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2160

Reviewed By: beauby

Differential Revision: D33975820

Pulled By: mdouze

fbshipit-source-id: aad8fd566171688213567dbb527cb80ec80d3b65
2022-02-03 09:47:54 -08:00
Check Deng
12d88a5fd8 Fix omp & memory leak (#2168)
Summary:
Fix an OMP bug and a memory leakage bug. The first one would lead to non-deterministic results and even worse.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2168

Test Plan: buck test //faiss/tests/:test_lsq -- test_deterministic

Reviewed By: beauby

Differential Revision: D33975589

Pulled By: mdouze

fbshipit-source-id: c1cf2589b0e718354ccf0221c3474633bcb8c7ee
2022-02-03 09:44:45 -08:00
Ben Mann
30abcd6a86 Add assertion to merge_ondisk.py (#2190)
Summary:
Fixes https://github.com/facebookresearch/faiss/issues/2188

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2190

Reviewed By: beauby

Differential Revision: D33975889

Pulled By: mdouze

fbshipit-source-id: 364eeac8de02f3ae00c9676198ed2ce27cfcd12b
2022-02-03 05:14:22 -08:00
pletessier
9747259e2f SWIG wrapper for InvertedListScanner (#2200)
Summary:
As discussed in https://github.com/facebookresearch/faiss/issues/2072, here is a PR to use InvertedListScanner with Python. It might be slower but it gives access to this features in Python for those who want to avoid C++.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2200

Test Plan: check that it compiles

Reviewed By: beauby

Differential Revision: D33975686

Pulled By: mdouze

fbshipit-source-id: dd731f90ce1609d555a17551fcc8c39eadf3fbd7
2022-02-03 03:46:52 -08:00
Jeff Johnson
04d31fac53 Faiss GPU: fix gpu 0 usage if gpu 0 is not used (GitHub issue 2178) (#2182)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2182

This diff fixes the issue in https://github.com/facebookresearch/faiss/issues/2178

namely that cudaHostAlloc does seem to perform some GPU initialization on the current device that is active.

Reviewed By: beauby

Differential Revision: D33480473

fbshipit-source-id: 4dc76c12b7be2caa96294bbb1aeaf9a44d030ae9
2022-01-10 10:31:03 -08:00
Matthijs Douze
07a874d5b1 Post-training refinement of residual quantizer codebooks (#2166)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2166

RQ training is done progressively from one quantizer to the next, maintaining a current set of codes and quantization centroids.
However, for RQ as for any additive quantizer, there is a closed form solution for the centroids that minimizes the quantization error for fixed codes.
This diff offers the option to estimate that codebook at the end of the optimization. It performs this estimation iteratively, ie. several rounds of code computation - codebook refinement are performed.

A pure python implementation + results is here:
https://github.com/fairinternal/faiss_improvements/blob/dbcc746/decoder/refine_aq_codebook.ipynb

Reviewed By: wickedfoo

Differential Revision: D33309409

fbshipit-source-id: 55c13425292e73a1b05f00e90f4dcfdc8b3549e8
2022-01-05 00:59:16 -08:00
Lucas Hosseini
c08cbff1a4 Prepare for v1.7.2 release. (#2151)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2151

Reviewed By: mdouze

Differential Revision: D33157463

Pulled By: beauby

fbshipit-source-id: 8f8580a7ad953484f41fbbb2d001f3484fab4c3d
v1.7.2
2021-12-16 15:44:23 -08:00
Matthijs Douze
d68ff42195 Fix OPQ dimension parsing (#2147)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2147

There was a bug in the OPQ string parsing. This diff adds a test and fixes the error.

Reviewed By: aijanai

Differential Revision: D33020167

fbshipit-source-id: 32e43653849b258a3b6d0cfdc44a6c637433f2c8
2021-12-11 03:28:29 -08:00
Lucas Hosseini
a0de37bd18 Update CUDA driver on CircleCI. (#2146)
Summary:
A recent CUDA driver is required for building packages for CUDA 11.3.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2146

Reviewed By: wickedfoo

Differential Revision: D33020204

Pulled By: beauby

fbshipit-source-id: 01257b1dcb4987f4866cc058c22d1dd5977d76ce
2021-12-10 10:14:28 -08:00
Lucas Hosseini
7492d23354 Fix packaging (#2121)
Summary:
- Disable problematic tests on OSX.
- Ensure compiler compatibility with CUDA builds.
- Fix path for Python extension libraries.
- Use CentOS for CUDA packaging.
- Update CUDA versions in CI (10.2 and 11.3).

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2121

Reviewed By: mdouze

Differential Revision: D32921117

Pulled By: beauby

fbshipit-source-id: 588c18add8084b8228ff5abc651eaa4567919cc6
2021-12-07 13:12:30 -08:00
Lucas Hosseini
812e97daf4 Fix deadlock in HNSW. (#2143)
Summary:
IndexHNSW has a deadlock in the add() method, which is fixed by
temporarily releasing the lock on the current element while updating
its neighbors' adjacency lists.

This bug concerns multi-threaded insertion only, and seems to manifest
itself only with certain OpenMP configurations.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2143

Reviewed By: mdouze

Differential Revision: D32919041

Pulled By: beauby

fbshipit-source-id: e515541c1b22bfcb79d29c0bde1843e63f5175fb
2021-12-07 09:15:44 -08:00
CodemodService FBSourceClangFormatLinterBot
b60722c4cd Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D32910817

fbshipit-source-id: 60d0cb10412e1a37a0249bb223b75855c5596dbd
2021-12-07 08:10:49 -08:00
Matthijs Douze
a0b50e669f Re-factor factory string parsing (#2134)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2134

The old parsing was very complex and grew out of hand.
this diff just uses regex parsing.

Reviewed By: wickedfoo

Differential Revision: D32759110

fbshipit-source-id: 243029bba8a7fe70c71323f5edc7e2ce4e669757
2021-12-07 04:35:57 -08:00
Matthijs Douze
c0052c1533 IndexFlatCodes: a single parent for all flat codecs (#2132)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2132

This diff adds the class IndexFlatCodes that becomes the parent of all "flat" encodings.
IndexPQ
IndexFlat
IndexAdditiveQuantizer
IndexScalarQuantizer
IndexLSH
Index2Layer

The other changes are:
- for IndexFlat, there is no vector<float> with the data anymore. It is replaced with a `get_xb()` function. This broke quite a few external codes, that this diff also attempts to fix.
- I/O functions needed to be adapted. This is done without changing the I/O format for any index.
- added a small contrib function to get the data from the IndexFlat
- the functionality has been made uniform, for example remove_ids and add are now in the parent class.

Eventually, we may support generic storage for flat indexes, similar to `InvertedLists`, eg to memmap the data, but this will again require a big change.

Reviewed By: wickedfoo

Differential Revision: D32646769

fbshipit-source-id: 04a1659173fd51b130ae45d345176b72183cae40
2021-12-07 01:31:07 -08:00