Commit Graph

27 Commits (21dfdbaaa0e30f2e16ad98ae4f94c2952e7178ce)

Author SHA1 Message Date
Mengdi Lin c0b32d2821 fix ARM64 SVE CI due to openblas version bump (#3777)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3777

openblas version is bumped from 0.3.27 -> 0.3.28 in the last 3 days. This caused the below test to fail. Confirmed with algoriddle bumping nprobe is okay to do

Reviewed By: algoriddle

Differential Revision: D61536541

fbshipit-source-id: 1e83f75011517ba7b856520f11526e72a00494a5
2024-08-20 06:25:40 -07:00
Naveen Tatikonda 33c0ba5d00 Add SQ8bit signed quantization (#3501)
Summary:
### Description
Add new signed 8 bit scalar quantizer, `QT_8bit_direct_signed` to ingest signed 8 bit vectors ([-128 to 127]).

### Issues Resolved
https://github.com/facebookresearch/faiss/issues/3488

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3501

Reviewed By: mengdilin

Differential Revision: D58639363

Pulled By: mdouze

fbshipit-source-id: cf7f244fdbb7a34051d2b20c6f8086cd5628b4e0
2024-06-24 05:11:53 -07:00
Xiao Fu 5e452ed52a Cleaning up more unnecessary print (#3455)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3455

Code quality control by reducing the number of prints

Reviewed By: junjieqi

Differential Revision: D57502194

fbshipit-source-id: a6cd65ed4cc49590ce73d2978d41b640b5259c17
2024-05-17 16:59:36 -07:00
Gergely Szilvasy 0013c702f4 avx512 CI + conda packages (#3197)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3197

Reviewed By: mlomeli1

Differential Revision: D52689379

Pulled By: algoriddle

fbshipit-source-id: 54e27c6d310d6da14777ae10ae62f46e7076cacf
2024-01-11 08:26:33 -08:00
Matthijs Douze 1c1d5c808f Make tests a little less verbose
Summary: Useful info on github test runs is burried in spurious logging. Avoid this.

Reviewed By: mlomeli1

Differential Revision: D47209139

fbshipit-source-id: b5111c91e2b94f0c3678d599197f8e7094993df1
2023-07-04 07:02:53 -07:00
Matthijs Douze 74ee67aefc CodePacker for non-contiguous code layouts (#2625)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2625

This diff introduces a new abstraction for the code layouts that are not simply flat one after another.

The packed codes are assumed to be packed together in fixed-size blocks. Hence, code `#i` is stored at offset `i % nvec` of block `floor(i / nvec)`. Each block has size `block_size`.

The `CodePacker` object takes care of the translation between packed and flat codes. The packing / unpacking functions are virtual functions now, but they could as well be inlined for performance.

The `CodePacker` object makes it possible to do manipulations onarrays of codes (including inverted lists) in a uniform way, for example merging / adding / updating / removing / converting to&from CPU.

In this diff, the only non-trivial CodePacker implemnted is for the FastScan code. The new functionality supported is merging IVFFastScan indexes.

Reviewed By: alexanderguzhva

Differential Revision: D42072972

fbshipit-source-id: d1f8bdbcf7ab0f454b5d9c37ba2720fd191833d0
2022-12-21 11:06:53 -08:00
Ivan Sopin d50211a38f Break distance ties in `heap_replace_top()` by ID (#2245)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2245

This changeset makes the `heap_replace_top()` function of the FAISS heap implementation break distance ties by the element's ID, according to the heap's min/max property.

Reviewed By: mdouze

Differential Revision: D34669542

fbshipit-source-id: 0db24fd12442eedeee917fbb3e811ba4a070ce0f
2022-03-09 10:23:48 -08:00
Matthijs Douze c0052c1533 IndexFlatCodes: a single parent for all flat codecs (#2132)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2132

This diff adds the class IndexFlatCodes that becomes the parent of all "flat" encodings.
IndexPQ
IndexFlat
IndexAdditiveQuantizer
IndexScalarQuantizer
IndexLSH
Index2Layer

The other changes are:
- for IndexFlat, there is no vector<float> with the data anymore. It is replaced with a `get_xb()` function. This broke quite a few external codes, that this diff also attempts to fix.
- I/O functions needed to be adapted. This is done without changing the I/O format for any index.
- added a small contrib function to get the data from the IndexFlat
- the functionality has been made uniform, for example remove_ids and add are now in the parent class.

Eventually, we may support generic storage for flat indexes, similar to `InvertedLists`, eg to memmap the data, but this will again require a big change.

Reviewed By: wickedfoo

Differential Revision: D32646769

fbshipit-source-id: 04a1659173fd51b130ae45d345176b72183cae40
2021-12-07 01:31:07 -08:00
Matthijs Douze 2d380e992b Add manifold check for size 0 (#1867)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1867

Merging code for the 1T photodna index seems to fail at

https://www.internalfb.com/phabricator/paste/view/P412975011?lines=174

with
```
terminate called after throwing an instance of 'facebook::manifold::blobstore::StorageException'
  what():  [400] Begin offset and/or length were invalid -- Begin offset must be positive and length must be non-negative. Received: offset = 2642410612, length = 0
Aborted (core dumped)
```
traces back to

https://www.internalfb.com/intern/diffusion/FBS/browsefile/master/fbcode/manifold/blobstore/BlobstoreThriftHandler.cpp?lines=671%2C700%2C732

There is a single case where we don't check if the read or write size is 0. So let's try this fix.

In the process I realized that the Manifold tests were non functional due to a name collision on common.py. Also fix this in all dependent files.

Differential Revision: D28231710

fbshipit-source-id: 700ffa6ca0c82c49e7d1eae9e76549ec5ff16332
2021-05-09 22:30:31 -07:00
Check Deng b35103a138 Add NSG (#1707)
Summary:
## Description:
This diff implemented Navigating Spreading-out Graph (NSG) which accepts a KNN graph as input.
Here is the interface of building an NSG graph:
``` c++
void IndexNSG::build(idx_t n, const float *x, idx_t *knn_graph, int GK);
```
where `GK` is the nb of neighbors per node and `knn_graph[i * GK + j]` is the j-th neighbor of node i.

The `add` method is not implemented yet.

The unit tests could be found in `tests/test_nsg.cpp`.

mdouze beauby Maybe I need some advice on how to design the interface and support python.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1707

Test Plan: buck test //faiss/tests/:test_index -- TestNSG

Reviewed By: beauby

Differential Revision: D26748498

Pulled By: mdouze

fbshipit-source-id: 3280f705fb1b5f9c8cc5efeba63b904c3b832544
2021-03-10 15:03:00 -08:00
Matthijs Douze 189aecb224 Fix polysemous OOM
Summary: Polysemous training can OOM because it uses tables of size n^2 with n is 2**nbit of the PQ. This throws and exception when the table threatens to become too large. It also reduces the number of threads when this would make it possible to fit the computation within max_memory bytes.

Reviewed By: wickedfoo

Differential Revision: D26856747

fbshipit-source-id: bd98e60293494e2f4b2b6d48eb1200efb1ce683c
2021-03-06 00:40:05 -08:00
H. Vetinari 42c6175535 fix warning about deprecate assertEquals (#1738)
Summary:
There's an annoying warning on every test run that I'd like to fix
```
=============================== warnings summary ===============================
tests/test_index_accuracy.py::TestRefine::test_IP
tests/test_index_accuracy.py::TestRefine::test_L2
  $SRC_DIR/tests/test_index_accuracy.py:726: DeprecationWarning: Please use assertEqual instead.
    self.assertEquals(recall1, recall2)
```

I've tried sneaking this into https://github.com/facebookresearch/faiss/issues/1704 & https://github.com/facebookresearch/faiss/issues/1717 already, but the first needs more time and
in the second, beauby asked me to keep this separate, so here's a new PR. :)

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1738

Reviewed By: wickedfoo

Differential Revision: D26855644

Pulled By: mdouze

fbshipit-source-id: 1198a9d9b3a79dfeb1d69513a61229fb45924f89
2021-03-05 13:46:35 -08:00
Check Deng d6535a3d87 Add NNDescent to faiss (#1654)
Summary:
As discussed in https://github.com/facebookresearch/faiss/issues/685, I'm going to add an NSG index to faiss. This PR which adds an NNDescent index is the first step as I commented [here ](https://github.com/facebookresearch/faiss/issues/685#issuecomment-760608431).

**Changes:**
1. Add an `IndexNNDescent` and an `IndexNNDescentFlat` which allow users to construct a KNN graph on a million scale dataset using CPU and search NN on it. The implementation part is put under `faiss/impl`.
2. Add compilation entries to `CMakeLists.txt` for C++ and `swigfaiss.swig` for Python. `IndexNNDescentFlat` could be directly called by users in C++ and Python.
3. `VisitedTable` struct in `HNSW.h` is moved into `AuxIndexStructures.h`.
3. Add a demo `demo_nndescent.cpp` to demonstrate the effectiveness.

**TODO**
1. Support index factor.
2. Implement `IndexNNDescentPQ` and `IndexNNDescentSQ`
3. More comments in the code.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1654

Test Plan:
buck test //faiss/tests/:test_index_accuracy -- TestNNDescent

buck test //faiss/tests/:test_build_blocks -- TestNNDescentKNNG

Reviewed By: wickedfoo

Differential Revision: D26309716

Pulled By: mdouze

fbshipit-source-id: 2abade9708d29023f8bccbf77143e8eea14f66c4
2021-02-25 16:48:28 -08:00
shengjun.li 908812266c Add heap_replace_top to simplify heap_pop + heap_push (#1597)
Summary:
Signed-off-by: shengjun.li <shengjun.li@zilliz.com>

Add heap_replace_top to simplify heap_pop + heap_push

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1597

Test Plan:
OMP_NUM_THREADS=1 buck run mode/opt //faiss/benchs/:bench_heap_replace
OMP_NUM_THREADS=8 buck run mode/opt //faiss/benchs/:bench_heap_replace

Reviewed By: beauby

Differential Revision: D25943140

Pulled By: mdouze

fbshipit-source-id: 66fe67779dd281a7753f597542c2e797ba0d7df5
2021-01-20 11:28:08 -08:00
Matthijs Douze c5975cda72 PQ4 fast scan benchmarks (#1555)
Summary:
Code + scripts for Faiss benchmarks around the  Fast scan codes.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1555

Test Plan: buck test //faiss/tests/:test_refine

Reviewed By: wickedfoo

Differential Revision: D25546505

Pulled By: mdouze

fbshipit-source-id: 902486b7f47e36221a2671d124df8c114f25db58
2020-12-16 01:18:58 -08:00
Matthijs Douze 6d0bc58db6 Implementation of PQ4 search with SIMD instructions (#1542)
Summary:
IndexPQ and IndexIVFPQ implementations with AVX shuffle instructions.

The training and computing of the codes does not change wrt. the original PQ versions but the code layout is "packed" so that it can be used efficiently by the SIMD computation kernels.

The main changes are:

- new IndexPQFastScan and IndexIVFPQFastScan objects

- simdib.h for an abstraction above the AVX2 intrinsics

- BlockInvertedLists for invlists that are 32-byte aligned and where codes are not sequential

- pq4_fast_scan.h/.cpp:  for packing codes and look-up tables + optmized distance comptuation kernels

- simd_result_hander.h: SIMD version of result collection in heaps / reservoirs

Misc changes:

- added contrib.inspect_tools to access fields in C++ objects

- moved .h and .cpp code for inverted lists to an invlists/ subdirectory, and made a .h/.cpp for InvertedListsIOHook

- added a new inverted lists type with 32-byte aligned codes (for consumption by SIMD)

- moved Windows-specific intrinsics to platfrom_macros.h

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1542

Test Plan:
```
buck test mode/opt  -j 4  //faiss/tests/:test_fast_scan_ivf //faiss/tests/:test_fast_scan
buck test mode/opt  //faiss/manifold/...
```

Reviewed By: wickedfoo

Differential Revision: D25175439

Pulled By: mdouze

fbshipit-source-id: ad1a40c0df8c10f4b364bdec7172e43d71b56c34
2020-12-03 10:06:38 -08:00
Lucas Hosseini cd38e82f0c
Facebook sync 2020-07-31 (#1308) 2020-08-03 22:15:02 +02:00
Lucas Hosseini 22b7876ef5
Facebook sync (2020-03-10) (#1136) 2020-03-10 14:24:07 +01:00
Lucas Hosseini 36ddba9196
Facebook sync (2019-09-10) (#943)
* Facebook sync (2019-09-10)

* Fix depends Makefile target.

* Add faiss symlink for new include directives.

* Fix missing header.

* Fix tests.

* Fix Makefile.

* Update depend.

* Fix include directives spacing.
2019-09-20 18:59:10 +02:00
Lucas Hosseini 3896b12c65
Facebook sync (Jun 2019) (#862)
Bugfixes:
- slow scanning of inverted lists (#836).

Features:
- add basic support for 6 new metrics in CPU `IndexFlat` and `IndexHNSW` (#848);
- add support for `IndexIDMap`/`IndexIDMap2` with binary indexes (#780).

Misc:
- throw python exception for OOM (#758);
- make `DistanceComputer` available for all random access indexes;
- gradually moving from `long` to `int64_t` for portability.
2019-06-19 15:59:06 +02:00
Lucas Hosseini a8118acbc5
Facebook sync (May 2019) + relicense (#838)
Changelog:

- changed license: BSD+Patents -> MIT
- propagates exceptions raised in sub-indexes of IndexShards and IndexReplicas
- support for searching several inverted lists in parallel (parallel_mode != 0)
- better support for PQ codes where nbit != 8 or 16
- IVFSpectralHash implementation: spectral hash codes inside an IVF
- 6-bit per component scalar quantizer (4 and 8 bit were already supported)
- combinations of inverted lists: HStackInvertedLists and VStackInvertedLists
- configurable number of threads for OnDiskInvertedLists prefetching (including 0=no prefetch)
- more test and demo code compatible with Python 3 (print with parentheses)
- refactored benchmark code: data loading is now in a single file
2019-05-28 16:17:22 +02:00
Lucas Hosseini afe0fdc161
Facebook sync (Mar 2019) (#756)
Facebook sync (Mar 2019)

- MatrixStats object
- option to round coordinates during k-means optimization
- alternative option for search in HNSW
- moved stats and imbalance_factor of IndexIVF to InvertedLists object
- range search for IVFScalarQuantizer
- direct unit8 codec in ScalarQuantizer
- renamed IndexProxy to IndexReplicas and moved to main Faiss
- better support for PQ code assignment with external index
- support for IMI2x16 (4B virtual centroids!)
- support for k = 2048 search on GPU (instead of 1024)
- most CUDA mem alloc failures throw exceptions instead of terminating on an assertion
- support for renaming an ondisk invertedlists
- interrupt computations with ctrl-C in python
2019-03-29 16:32:28 +01:00
Lucas Hosseini f417a53628
Fix CI tests (#687)
* Fix test_transfer_invlists.cpp

* Fix relative imports.

* Fix test_index_accuracy.py.

* Use default OSX version.

* Allow osx gcc6 build to fail.
2019-01-08 17:52:36 +01:00
matthijs daf589d9d2 add bench_all_ivf 2018-12-20 05:43:36 -08:00
Lucas Hosseini 323dbf3be3
Facebook sync (Dec 2018). (#660)
* Add GpuIndexBinaryFlat
* Add IndexBinaryHNSW
2018-12-19 17:48:35 +01:00
Tom Forbes a91a24e77a Fix #558 - Make `M` an integer (#589)
* Fix #558 - Make `d` an integer

* Use int()
2018-09-18 22:08:31 +02:00
Lucas Hosseini 76bec0b500
Facebook sync (#573)
Features:

- automatic tracking of C++ references in Python
- non-intel platforms supported -- some functions optimized for ARM
- override nprobe for concurrent searches
- support for floating-point quantizers in binary indexes
Bug fixes:

- no more segfaults in python (I know it's the same as the first feature but it's important!)
- fix GpuIndexIVFFlat issues for float32 with 64 / 128 dims
- fix sharding of flat indexes on GPU with index_cpu_to_gpu_multiple
2018-08-30 19:38:50 +02:00