45 Commits

Author SHA1 Message Date
Alexandr Guzhva
6a94c67a2f QT_bf16 for scalar quantizer for bfloat16 (#3444)
Summary:
mdouze Please let me know if any additional unit tests are needed

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3444

Reviewed By: algoriddle

Differential Revision: D57665641

Pulled By: mdouze

fbshipit-source-id: 9bec91306a1c31ea4f1f1d726c9d60ac6415fdfc
2024-05-23 02:59:15 -07:00
Xiao Fu
5e452ed52a Cleaning up more unnecessary print (#3455)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3455

Code quality control by reducing the number of prints

Reviewed By: junjieqi

Differential Revision: D57502194

fbshipit-source-id: a6cd65ed4cc49590ce73d2978d41b640b5259c17
2024-05-17 16:59:36 -07:00
Matthijs Douze
32f0e8cf92 Generalize ResultHanlder, support range search for HNSW and Fast Scan (#3190)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3190

This diff adds more result handlers in order to expose them externally.
This enables range search for HSNW and Fast Scan, and nprobe parameter support for FastScan.

Reviewed By: pemazare

Differential Revision: D52547384

fbshipit-source-id: 271da5ffea6411df3d8e50641abade18bd7b774b
2024-01-11 11:46:30 -08:00
Maria Lomeli
c09992bc8a Back out "Better NaN handling" (#3006)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3006

Original commit changeset: 99e7786582e9

Original Phabricator Diff: D48031390

Reviewed By: algoriddle

Differential Revision: D48353221

fbshipit-source-id: fd326f2a45d20f68507ca39a33a325528651b37d
2023-08-15 09:32:01 -07:00
Matthijs Douze
a3fbf2d61c Better NaN handling (#2986)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2986

A NaN vector is a vector with at least one NaN (not-a-number) entry.
After discussion in the Faiss team we decided that:
- training should throw an exception on NaN vectors
- added NaN vectors should be ignored (never returned)
- searched NaN vectors should return only -1s

This diff implements this for a few common index types + adds relevant tests.

Reviewed By: algoriddle

Differential Revision: D48031390

fbshipit-source-id: 99e7786582e91950e3a53c1d8bcffdd00b6afd24
2023-08-04 06:51:06 -07:00
Matthijs Douze
b9ea339617 support range search from GPU (#2860)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2860

Optimized range search function where the GPU computes by default and falls back on gpu for queries where there are too many results.

Parallelize the CPU to GPU cloning, it seems to work.

Support range_search_preassigned in Python

Fix long-standing issue with SWIG exposed functions that did not release the GIL (in particular the MapLong2Long).

Adds a MapInt64ToInt64 that is more efficient than MapLong2Long.

Reviewed By: algoriddle

Differential Revision: D45672301

fbshipit-source-id: 2e77397c40083818584dbafa5427149359a2abfd
2023-05-16 00:27:53 -07:00
Alexandr Guzhva
5b172252ef HNSW speedup + Distance 4 points (#2841)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2841

* Add virtual void DistanceComputer::distances_to_four_indices()
* Add the infrastructure
* HNSW::search() uses DistanceComputer::distances_to_four_indices()
* Add IndexFlatL2::sync_l2norms() and IndexFlatL2::clear_l2norms() that allow to precompute L2 cache for stored vectors and compute L2 distance using dot product
* Add downcasting of IndexFlatL2 and IndexFlatIP in swig
* Add general-purpose prefetch utilities

Reviewed By: mdouze

Differential Revision: D45427064

fbshipit-source-id: d23b34fe080dbff951d34cdc1323813bd3b828e0
2023-05-05 16:13:16 -07:00
Denis Yaroshevskiy
45b16d23a1 faiss: use autovectorization for inner product (#2712)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2712

Using autovectorization to get the proper urnolling.

Previous version timings are:

Before
```
faiss_ip_10000                        2.10us   475.62K
faiss_n2_10000                        4.23us   236.30K
```

After
```
faiss_ip_10000                         1.21us   827.16K
faiss_n2_10000                       640.68ns     1.56M
```

Reviewed By: alexanderguzhva

Differential Revision: D43353199

fbshipit-source-id: 8f73a34acd4b0368be6cdb05ba7a99a566c9ed83
2023-02-16 10:24:25 -08:00
Matthijs Douze
240e6dda08 Fix test timeouts (#2618)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2618

The Faiss tests run in dev mode are very slow
The PQ polysemous training is particularly sensitive to this with the default settings.
This diff adds a "np" suffix to two PQ factory strings to disable polysemous training. The tests that are detected as flaky because they occasionally time out.

Reviewed By: alexanderguzhva

Differential Revision: D41955699

fbshipit-source-id: b1e0382a0142a3ed28b498c5ea6f5499de2c1b3f
2022-12-12 09:04:43 -08:00
Matthijs Douze
9f13e43486 Building blocks for big batch IVF search
Summary:
Adds:
-  a sparse update function to the heaps
- bucket sort functions
- an IndexRandom index to serve as a dummy coarse quantizer for testing

Reviewed By: algoriddle

Differential Revision: D41804055

fbshipit-source-id: 9402b31c37c367aa8554271d8c88bc93cc1e2bda
2022-12-08 09:34:16 -08:00
Check Deng
31a8ca163e IO support for IndexNNDescent (#2493)
Summary:
This PR added I/O methods for IndexNNDescent.

Closes https://github.com/facebookresearch/faiss/issues/2125

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2493

Test Plan: buck test //faiss/tests:test_index -- TestNNDescent

Reviewed By: mlomeli1

Differential Revision: D39883381

Pulled By: mdouze

fbshipit-source-id: 6fcb0c4e08e66c56750ae48ee22b0b4a958243ae
2022-09-28 06:16:11 -07:00
Check Deng
a03a1eba8b Add IndexNSGPQ and IndexNSGSQ (#2218)
Summary:
This diff added IndexNSGPQ and IndexNSGSQ, including index factory and I/O. And also fixed the ARM CI.

Fixed https://github.com/facebookresearch/faiss/issues/2128

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2218

Reviewed By: beauby

Differential Revision: D34276313

Pulled By: mdouze

fbshipit-source-id: a5014af8447800ad15bd89b4f87204b4b36866d2
2022-02-18 04:51:15 -08:00
Lucas Hosseini
812e97daf4 Fix deadlock in HNSW. (#2143)
Summary:
IndexHNSW has a deadlock in the add() method, which is fixed by
temporarily releasing the lock on the current element while updating
its neighbors' adjacency lists.

This bug concerns multi-threaded insertion only, and seems to manifest
itself only with certain OpenMP configurations.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2143

Reviewed By: mdouze

Differential Revision: D32919041

Pulled By: beauby

fbshipit-source-id: e515541c1b22bfcb79d29c0bde1843e63f5175fb
2021-12-07 09:15:44 -08:00
Check Deng
6c99782f7c Fix unorder bug in NSG (#2086)
Summary:
The results returned by `NSG::search` are already sorted. Calling `maxheap_reorder` will make the results unorder.

Fixed https://github.com/facebookresearch/faiss/issues/2081.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2086

Test Plan: buck test //faiss/tests/:test_index -- test_order

Reviewed By: beauby

Differential Revision: D32593924

Pulled By: mdouze

fbshipit-source-id: 794b94681610657bd2f305f7e3d6cd5d25c6bdba
2021-11-22 11:41:01 -08:00
Matthijs Douze
3eb82e32dc Range search bug
Summary:
This diff fixes a serious bug in the range search implementation.

During range search in a flat index, (exhaustive_L2sqr_seq and exhaustive_inner_product_seq) when running in multiple threads, the per-thread results are collected into RangeSearchPartialResult structures.

When the computation is finished, they are aggregated into a RangeSearchResult. In the previous version of the code, this loop was nested into a second loop that is used to check for KeyboardInterrupts. Thus, at each iteration, the results were overwritten.

The fix removes the outer loop. It is most likely useless anyways because the sequential code is called only for a small number of queries, for a larger number the BLAS version is used.

Reviewed By: wickedfoo

Differential Revision: D28486415

fbshipit-source-id: 89a52b17f6ca1ef68fc5e758f0e5a44d0df9fe38
2021-05-17 23:10:20 -07:00
Matthijs Douze
2d380e992b Add manifold check for size 0 (#1867)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1867

Merging code for the 1T photodna index seems to fail at

https://www.internalfb.com/phabricator/paste/view/P412975011?lines=174

with
```
terminate called after throwing an instance of 'facebook::manifold::blobstore::StorageException'
  what():  [400] Begin offset and/or length were invalid -- Begin offset must be positive and length must be non-negative. Received: offset = 2642410612, length = 0
Aborted (core dumped)
```
traces back to

https://www.internalfb.com/intern/diffusion/FBS/browsefile/master/fbcode/manifold/blobstore/BlobstoreThriftHandler.cpp?lines=671%2C700%2C732

There is a single case where we don't check if the read or write size is 0. So let's try this fix.

In the process I realized that the Manifold tests were non functional due to a name collision on common.py. Also fix this in all dependent files.

Differential Revision: D28231710

fbshipit-source-id: 700ffa6ca0c82c49e7d1eae9e76549ec5ff16332
2021-05-09 22:30:31 -07:00
Chengqi Deng
c62ab3a696 Use BLAS to compute sdc table (#1809)
Summary:
This PR used BLAS to compute sdc table in ProductQuantizer.

Here is the time of computing sdc tables:

```
nbits=8, d=128 (this commit)
M: 2, sdc: 0.0001361370086669922s
M: 4, sdc: 8.273124694824219e-05s
M: 8, sdc: 7.867813110351562e-05s
M: 16, sdc: 0.0001227855682373047s
M: 32, sdc: 0.0001697540283203125s
M: 64, sdc: 0.0007395744323730469s
```

```
nbits=8, d=128 (master)
M: 2,  sdc: 0.0055773258209228516s
M: 4,  sdc: 0.005366802215576172s
M: 8,  sdc: 0.0050809383392333984s
M: 16, sdc: 0.005639791488647461s
M: 32, sdc: 0.006036281585693359s
M: 64, sdc: 0.009720802307128906s
```

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1809

Reviewed By: beauby

Differential Revision: D27706249

Pulled By: mdouze

fbshipit-source-id: 102ae0c1c157e244e40557656934062f537b74d4
2021-04-16 00:17:51 -07:00
Check Deng
c37c2fa393 Support I/O and clone for NSG (#1766)
Summary:
This PR added IO and clone support to NSG.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1766

Test Plan: buck test //faiss/tests/:test_index -- TestNSG

Reviewed By: beauby

Differential Revision: D27189414

Pulled By: mdouze

fbshipit-source-id: c35c253d043c711d09a675f4ba5c3317b9423b5b
2021-03-23 09:18:15 -07:00
Check Deng
b35103a138 Add NSG (#1707)
Summary:
## Description:
This diff implemented Navigating Spreading-out Graph (NSG) which accepts a KNN graph as input.
Here is the interface of building an NSG graph:
``` c++
void IndexNSG::build(idx_t n, const float *x, idx_t *knn_graph, int GK);
```
where `GK` is the nb of neighbors per node and `knn_graph[i * GK + j]` is the j-th neighbor of node i.

The `add` method is not implemented yet.

The unit tests could be found in `tests/test_nsg.cpp`.

mdouze beauby Maybe I need some advice on how to design the interface and support python.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1707

Test Plan: buck test //faiss/tests/:test_index -- TestNSG

Reviewed By: beauby

Differential Revision: D26748498

Pulled By: mdouze

fbshipit-source-id: 3280f705fb1b5f9c8cc5efeba63b904c3b832544
2021-03-10 15:03:00 -08:00
Dikpal Reddy
2b1194a3fa Ensure that invalid k/nprobe search input parameters to Faiss / Faiss GPU don't crash
Summary: Checking for invalid parameters (number of nearest neighbors and number of probes where applicable) in the indices and throwing. Along with unit tests.

Reviewed By: wickedfoo

Differential Revision: D26582467

fbshipit-source-id: e345635d2f0f44ddcecc3f3314b2b9113359a787
2021-03-03 21:17:28 -08:00
Lucas Hosseini
6d51766607 Fix unused variables in python
Reviewed By: mdouze

Differential Revision: D26633983

fbshipit-source-id: 32b9f95ed9647716f65b93f2713a8d5bad6abe78
2021-02-24 11:52:18 -08:00
Matthijs Douze
c5975cda72 PQ4 fast scan benchmarks (#1555)
Summary:
Code + scripts for Faiss benchmarks around the  Fast scan codes.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1555

Test Plan: buck test //faiss/tests/:test_refine

Reviewed By: wickedfoo

Differential Revision: D25546505

Pulled By: mdouze

fbshipit-source-id: 902486b7f47e36221a2671d124df8c114f25db58
2020-12-16 01:18:58 -08:00
Matthijs Douze
e1adde0d84 Faster brute force search (#1502)
Summary:
This diff streamlines the code that collects results for brute force distance computations for the L2 / IP and range search / knn search combinations.

It introduces a `ResultHandler` template class that abstracts what happens with the computed distances and ids. In addition to the heap result handler and the range search result handler, it introduces a reservoir result handler that improves the search speed for  large k (>=100).

Benchmark results (https://fb.quip.com/y0g1ACLEqJXx#OCaACA2Gm45) show that on small datasets (10k) search is 10-50% faster (improvements are larger for small k). There is room for improvement in the reservoir implementation, whose implementation is quite naive currently, but the diff is already useful in its current form.

Experiments on precomputed db vector norms for L2 distance computations were not very concluding performance-wise, so the implementation is removed from IndexFlatL2.

This diff also removes IndexL2BaseShift, which was never used.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1502

Test Plan:
```
buck test //faiss/tests/:test_product_quantizer
buck test //faiss/tests/:test_index -- TestIndexFlat
```

Reviewed By: wickedfoo

Differential Revision: D24705464

Pulled By: mdouze

fbshipit-source-id: 270e10b19f3c89ed7b607ec30549aca0ac5027fe
2020-11-04 22:16:23 -08:00
Jeff Johnson
ef6e53f8ba Cleanup flag/data propagation for IndexShards and IndexReplicas
Summary:
This diff fixes https://github.com/facebookresearch/faiss/issues/1412

There were various inconsistencies in how the shard and replica wrappers updated their internal state as the sub-indices were updated. This makes the two container classes work in the same way with similar synchronization functionality.

Reviewed By: beauby

Differential Revision: D23974186

fbshipit-source-id: c688c0c9124f823e4239aa2ff617b007b4564859
2020-09-29 10:25:46 -07:00
Matthijs Douze
6d73c2ff69 Fix int64 for python tests in windows (#1381)
Summary:
`long` is 32 bits on windows and so is the default int type for numpy (eg. the one used for `np.arange`).
This diff explicitly specifies 64-bit ints for all occurrences where it matters.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1381

Reviewed By: wickedfoo

Differential Revision: D23371232

Pulled By: mdouze

fbshipit-source-id: 220262cd70ee70379f83de93561a4eae71c94b04
2020-08-27 12:40:55 -07:00
Lucas Hosseini
24c4460dd2 Avoid leaking file descriptors in python tests. (#1353)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1353

Test Plan: Imported from OSS

Reviewed By: mdouze

Differential Revision: D23292456

Pulled By: beauby

fbshipit-source-id: 44458eb16d037883ff39827accf5edddb1b1bb89
2020-08-24 06:46:52 -07:00
Lucas Hosseini
a17a631dc3
Sync 20200323. (#1157)
* Sync 20200323.

* Bump version.

* Remove warning filter.
2020-03-24 14:06:48 +01:00
Lucas Hosseini
22b7876ef5
Facebook sync (2020-03-10) (#1136) 2020-03-10 14:24:07 +01:00
Lucas Hosseini
36ddba9196
Facebook sync (2019-09-10) (#943)
* Facebook sync (2019-09-10)

* Fix depends Makefile target.

* Add faiss symlink for new include directives.

* Fix missing header.

* Fix tests.

* Fix Makefile.

* Update depend.

* Fix include directives spacing.
2019-09-20 18:59:10 +02:00
Lucas Hosseini
a8118acbc5
Facebook sync (May 2019) + relicense (#838)
Changelog:

- changed license: BSD+Patents -> MIT
- propagates exceptions raised in sub-indexes of IndexShards and IndexReplicas
- support for searching several inverted lists in parallel (parallel_mode != 0)
- better support for PQ codes where nbit != 8 or 16
- IVFSpectralHash implementation: spectral hash codes inside an IVF
- 6-bit per component scalar quantizer (4 and 8 bit were already supported)
- combinations of inverted lists: HStackInvertedLists and VStackInvertedLists
- configurable number of threads for OnDiskInvertedLists prefetching (including 0=no prefetch)
- more test and demo code compatible with Python 3 (print with parentheses)
- refactored benchmark code: data loading is now in a single file
2019-05-28 16:17:22 +02:00
Lucas Hosseini
afe0fdc161
Facebook sync (Mar 2019) (#756)
Facebook sync (Mar 2019)

- MatrixStats object
- option to round coordinates during k-means optimization
- alternative option for search in HNSW
- moved stats and imbalance_factor of IndexIVF to InvertedLists object
- range search for IVFScalarQuantizer
- direct unit8 codec in ScalarQuantizer
- renamed IndexProxy to IndexReplicas and moved to main Faiss
- better support for PQ code assignment with external index
- support for IMI2x16 (4B virtual centroids!)
- support for k = 2048 search on GPU (instead of 1024)
- most CUDA mem alloc failures throw exceptions instead of terminating on an assertion
- support for renaming an ondisk invertedlists
- interrupt computations with ctrl-C in python
2019-03-29 16:32:28 +01:00
Lucas Hosseini
f417a53628
Fix CI tests (#687)
* Fix test_transfer_invlists.cpp

* Fix relative imports.

* Fix test_index_accuracy.py.

* Use default OSX version.

* Allow osx gcc6 build to fail.
2019-01-08 17:52:36 +01:00
matthijs
daf589d9d2 add bench_all_ivf 2018-12-20 05:43:36 -08:00
Lucas Hosseini
323dbf3be3
Facebook sync (Dec 2018). (#660)
* Add GpuIndexBinaryFlat
* Add IndexBinaryHNSW
2018-12-19 17:48:35 +01:00
Lucas Hosseini
76bec0b500
Facebook sync (#573)
Features:

- automatic tracking of C++ references in Python
- non-intel platforms supported -- some functions optimized for ARM
- override nprobe for concurrent searches
- support for floating-point quantizers in binary indexes
Bug fixes:

- no more segfaults in python (I know it's the same as the first feature but it's important!)
- fix GpuIndexIVFFlat issues for float32 with 64 / 128 dims
- fix sharding of flat indexes on GPU with index_cpu_to_gpu_multiple
2018-08-30 19:38:50 +02:00
Lucas Hosseini
6880286ea0
Facebook sync (#504)
* Facebook sync

* Update swig wrappers.

* Fix comment.
2018-07-06 14:12:11 +02:00
Lucas Hosseini
6e40d6689f
Move python tests back together with C++ tests. (#479) 2018-06-04 12:20:44 +02:00
Lucas Hosseini
cf18101f6d Refactor makefiles and add configure script (#466)
* Refactors Makefiles and add configure script.

* Give MKL higher priority in configure script.

* Clean up Linux example makefile.inc.

* Cleanup makefile.inc examples.

* Fix python clean Makefile target.

* Regen swig wrappers.

* Remove useless CUDAFLAGS variable.

* Fix python linking flags.

* Separate compile and link phase in python makefile.

* Add macro to look for swig.

* Add CUDA check in configure script.

* Cleanup make depend targets.

* Cleanup CUDA flags.

* Fix linking flags.

* Fix python GPU linking.

* Remove useless flags from python gpu module linking.

* Add check for cuda libs.

* Cleanup GPU targets.

* Clean up test target.

* Add cpu/gpu targets to python makefile.

* Clean up tutorial Makefile.

* Remove stale OS var from example makefiles.

* Clean up cuda example flags.
2018-06-02 08:35:30 +02:00
Matthijs Douze
0c482e54eb sync with FB version 2018-02-23 (#347)
- support on-disk IVF
2018-02-23 07:49:45 -08:00
matthijs
9933892ec9 sync with FB version 2017-01-09
- adding HNSW indexing method

- simultaneous search and reconstruction for IndexIVFPQ
2018-01-09 06:42:06 -08:00
matthijs
250a3d3f18 sync with FB version 2017-11-22
various bugfixes from github issues
kmean with some frozen centroids
GPU better tiling for large flat datasets
default AVX for vector ops
2017-11-22 05:11:28 -08:00
matthijs
a5ef16db89 sync with FB version 2017-08-09 2017-08-09 11:13:51 -07:00
matthijs
8e3dc6f2b0 changed license 2017-07-30 00:18:45 -07:00
matthijs
f7aedbdfc0 sync with FB version 2017-07-18
- implemented ScalarQuantizer (without IVF)
- implemented update for IndexIVFFlat
- implemented L2 normalization preproc
2017-07-18 02:51:27 -07:00
matthijs
784e2facd8 Synchronization with FB version 2017-06-21
* moved most FAISS_ASSERT calls to C++ exceptions, and adjusted
  memory allocation to avoid mem leaks

* added an IndexIVFScalarQuantizer type that offers an
  intermediate compression between IVFFlat and IVFPQ

* support removal of indices in IndexIDMap / IndexFlat combination

* various fixes in GPU code
2017-06-21 09:01:06 -07:00