Commit Graph

31 Commits (039409d950790fa289f33e88db504d520084ba9e)

Author SHA1 Message Date
Gergely Szilvasy 2768fb38b2 faiss-gpu-raft package (#2992)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2992

Reviewed By: mdouze

Differential Revision: D48391366

Pulled By: algoriddle

fbshipit-source-id: 94b7f62afc8a09a9feaea47bf60e5358d89fcde5
2023-08-16 09:30:41 -07:00
Matthijs Douze 687457b2f4 Access graph structure for NSG (#2984)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2984

It is not entirely trivial to access the NSG graph structure from Python (although it is a fixed size N-by-K matrix of vector ids).
This diff adds an inspect_tools function to do that.

Reviewed By: algoriddle

Differential Revision: D48026775

fbshipit-source-id: 94cd7be7f656bcd333d62586531f287ea8e052e5
2023-08-04 06:55:24 -07:00
Gergely Szilvasy 821a401ae9 CodeSet for deduping large datasets (#2949)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2949

A more scalable alternative to `np.unique` for deduping large datasets with a quantized code.

Reviewed By: mlomeli1

Differential Revision: D47443953

fbshipit-source-id: 4a1554d4d4200b5fa657e9d8b7395bba9856a8e3
2023-07-19 10:05:46 -07:00
Gergely Szilvasy 391601dc3f relax test_ivf_train_2level threshold (#2927)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2927

Reviewed By: mlomeli1

Differential Revision: D47017009

fbshipit-source-id: cfa1df4b9632b085d3a61b56d8617bebd7e5aad6
2023-06-26 05:02:47 -07:00
Matthijs Douze 07fe2b622f Binary cloning and GPU range search (#2916)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2916

Overall better support for binary indexes:
- cloning (to CPU and GPU), only for BinaryFlat for now
- fix bug in reconstruct_n
- range_search_max_results

Reviewed By: algoriddle

Differential Revision: D46755778

fbshipit-source-id: 777ad90aff5c54a77f9685ed6512247a922c6ef5
2023-06-19 06:05:14 -07:00
Gergely Szilvasy 092606b293 bbs producer/consumer threading (#2901)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2901

This diff allows each GPU to work independently, a hot centroid (eg. out-of-distribution queries that hit a centroid heavily) will only block the one GPU that is processing it, others will continue to pick up work independently.

Reviewed By: mdouze

Differential Revision: D46521298

fbshipit-source-id: 171cb06cce8b2d16b7bd744799b105b3cd525be3
2023-06-14 07:58:44 -07:00
Matthijs Douze 6800ebef83 Support independent IVF coarse quantizer
Summary: In the IndexIVFIndepenentQuantizer, the coarse quantizer is applied on the input vectors, but the encoding is performed on a vector-transformed version of the database elements.

Reviewed By: alexanderguzhva

Differential Revision: D45950970

fbshipit-source-id: 30f6cf46d44174b1d99a12384b7d5e2d475c1f88
2023-05-26 02:59:01 -07:00
Matthijs Douze b9ea339617 support range search from GPU (#2860)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2860

Optimized range search function where the GPU computes by default and falls back on gpu for queries where there are too many results.

Parallelize the CPU to GPU cloning, it seems to work.

Support range_search_preassigned in Python

Fix long-standing issue with SWIG exposed functions that did not release the GIL (in particular the MapLong2Long).

Adds a MapInt64ToInt64 that is more efficient than MapLong2Long.

Reviewed By: algoriddle

Differential Revision: D45672301

fbshipit-source-id: 2e77397c40083818584dbafa5427149359a2abfd
2023-05-16 00:27:53 -07:00
Matthijs Douze 2d8886cd4f IVF sorting routine (#2846)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2846

Adds a function to ivf_contrib to sort the inverted lists by size without changing the results. Also moves big_batch_search to its own module.

Reviewed By: algoriddle

Differential Revision: D45565880

fbshipit-source-id: 091a1c1c074f860d6953bf20d04523292fb55e1a
2023-05-04 09:59:06 -07:00
Matthijs Douze 016aa04602 make balanced clusters the default (#2796)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2796

This diff makes balanced clusters the default for 2-level clustering. This seems to improve a bit over the default uniform clusters, see

https://github.com/fairinternal/faiss_improvements/blob/master/better_coarse_quantizer/two_level_clustering.ipynb

Warning: the nc2 argument of two_level_clustering becomes the *total* number of clusters.

Reviewed By: algoriddle

Differential Revision: D44421222

fbshipit-source-id: 951b7fc043be4a41762a7e6f7a6fcfb71e303832
2023-03-28 07:23:30 -07:00
Matthijs Douze 0200d131fc fix windows test (#2775)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2775

Reviewed By: algoriddle

Differential Revision: D44210010

fbshipit-source-id: b9b620a4b0a874e09ee2f6082ff0f9463716fdf4
2023-03-21 05:34:50 -07:00
Matthijs Douze 2d7dd5b0a6 support checkpointing in big batch search
Summary: Big batch search can be running for hours so it's useful to have a checkpointing mechanism in case it's run on a best-effort cluster queue.

Reviewed By: algoriddle

Differential Revision: D44059758

fbshipit-source-id: 5cb5e80800c6d2bf76d9f6cb40736009cd5d4b8e
2023-03-14 11:11:50 -07:00
Matthijs Douze fa53e2c941 Implementation of big-batch IVF search (single machine) (#2567)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2567

Intuitively, it should be easier to handle big-batch searches because all distance computations for a set of queries can be done locally within each inverted list.

This benchmark implements this in pure python (but should be close to optimal in terms of speed), on CPU for IndexIVFFlat, IndexIVFPQ and IndexIVFScalarQuantizer. GPU is also supported.

The results are not systematically better, see https://docs.google.com/document/d/1d3YuV8uN7hut6aOATCOMx8Ut-QEl_oRnJdPgDBRF1QA/edit?usp=sharing

Reviewed By: algoriddle

Differential Revision: D41098338

fbshipit-source-id: 479e471b0d541f242d420f581775d57b708a61b8
2022-12-09 08:53:13 -08:00
Matthijs Douze f68ddd0564 fix test in test_contrib (#2294)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2294

there is a weird CI failure on one of the platforms occurring in the PR

https://github.com/facebookresearch/faiss/pull/2291

This diff makes the test a bit more robust, correcting inter_perf to computer the intersection measure. Hopefully this will make the bug go away.

Reviewed By: beauby

Differential Revision: D35558855

fbshipit-source-id: f5a926d9d8ebee975e538c65ac37b15d485798aa
2022-04-20 03:03:38 -07:00
Matthijs Douze b8fe92dfee contrib clustering module (#2217)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2217

This diff introduces a new Faiss contrib module that contains:
- generic k-means implemented in python (was in distributed_ondisk)
- the two-level clustering code, including a simple function that runs it on a Faiss IVF index.
- sparse clustering code (new)

The main idea is that that code is often re-used so better have it in contrib.

Reviewed By: beauby

Differential Revision: D34170932

fbshipit-source-id: cc297cc56d241b5ef421500ed410d8e2be0f1b77
2022-02-28 14:18:47 -08:00
Matthijs Douze eb8781557f Fix exhaustive search GT computation with IP distance (#2212)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2212

Fixes issue

https://github.com/facebookresearch/faiss/issues/2205

clear bug report
easy fix
easy to accept ;-)

Reviewed By: beauby

Differential Revision: D33975281

fbshipit-source-id: 088e1f3078dc79402563be7fac3530d76b197006
2022-02-07 19:36:21 -08:00
Matthijs Douze c0052c1533 IndexFlatCodes: a single parent for all flat codecs (#2132)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2132

This diff adds the class IndexFlatCodes that becomes the parent of all "flat" encodings.
IndexPQ
IndexFlat
IndexAdditiveQuantizer
IndexScalarQuantizer
IndexLSH
Index2Layer

The other changes are:
- for IndexFlat, there is no vector<float> with the data anymore. It is replaced with a `get_xb()` function. This broke quite a few external codes, that this diff also attempts to fix.
- I/O functions needed to be adapted. This is done without changing the I/O format for any index.
- added a small contrib function to get the data from the IndexFlat
- the functionality has been made uniform, for example remove_ids and add are now in the parent class.

Eventually, we may support generic storage for flat indexes, similar to `InvertedLists`, eg to memmap the data, but this will again require a big change.

Reviewed By: wickedfoo

Differential Revision: D32646769

fbshipit-source-id: 04a1659173fd51b130ae45d345176b72183cae40
2021-12-07 01:31:07 -08:00
Matthijs Douze 1829aa92a1 three small fixes (#1972)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1972

This fixes a few issues that I ran into + adds tests:

- range_search_max_results with IP search

- a few missing downcasts for VectorTRansforms

- ResultHeap supports max IP search

Reviewed By: wickedfoo

Differential Revision: D29525093

fbshipit-source-id: d4ff0aff1d83af9717ff1aaa2fe3cda7b53019a3
2021-07-01 16:08:45 -07:00
Matthijs Douze 2d380e992b Add manifold check for size 0 (#1867)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1867

Merging code for the 1T photodna index seems to fail at

https://www.internalfb.com/phabricator/paste/view/P412975011?lines=174

with
```
terminate called after throwing an instance of 'facebook::manifold::blobstore::StorageException'
  what():  [400] Begin offset and/or length were invalid -- Begin offset must be positive and length must be non-negative. Received: offset = 2642410612, length = 0
Aborted (core dumped)
```
traces back to

https://www.internalfb.com/intern/diffusion/FBS/browsefile/master/fbcode/manifold/blobstore/BlobstoreThriftHandler.cpp?lines=671%2C700%2C732

There is a single case where we don't check if the read or write size is 0. So let's try this fix.

In the process I realized that the Manifold tests were non functional due to a name collision on common.py. Also fix this in all dependent files.

Differential Revision: D28231710

fbshipit-source-id: 700ffa6ca0c82c49e7d1eae9e76549ec5ff16332
2021-05-09 22:30:31 -07:00
Matthijs Douze 3f2ebf4b1c Add preassigned functions to contrib
Summary:
Adds the preassigned add and search python wrappers to contrib.
Adds the preassigned search for the binary case (was missing before).
Also adds a real test for that functionality.

Reviewed By: beauby

Differential Revision: D26560021

fbshipit-source-id: 330b715a9ed0073cfdadbfbcb1c23b10bed963a5
2021-02-25 11:39:07 -08:00
Matthijs Douze 5602724979 make calling conventions uniform between faiss.knn and faiss.knn_gpu
Summary: The order of xb an xq was different between `faiss.knn` and `faiss.knn_gpu`. Also the metric argument was called distance_type. This diff fixes both. Hopefully not too much external code depends on it.

Reviewed By: wickedfoo

Differential Revision: D26222853

fbshipit-source-id: b43e143d64d9ecbbdf541734895c13847cf2696c
2021-02-03 12:21:40 -08:00
Matthijs Douze 3dd7ba8ff9 Add range search accuracy evaluation
Summary:
Added a few functions in contrib to:
- run range searches by batches on the query or the database side
- emulate range search on GPU: search on GPU with k=1024, if the farthest neighbor is still within range, re-perform search on CPU
- as reference implementations for precision-recall on range search datasets
- optimized code to plot precision-recall plots (ie. sweep over thresholds)

The new functions are mainly in a new `evaluation.py`

Reviewed By: wickedfoo

Differential Revision: D25627619

fbshipit-source-id: 58f90654c32c925557d7bbf8083efbb710712e03
2020-12-17 17:17:09 -08:00
Matthijs Douze 6d0bc58db6 Implementation of PQ4 search with SIMD instructions (#1542)
Summary:
IndexPQ and IndexIVFPQ implementations with AVX shuffle instructions.

The training and computing of the codes does not change wrt. the original PQ versions but the code layout is "packed" so that it can be used efficiently by the SIMD computation kernels.

The main changes are:

- new IndexPQFastScan and IndexIVFPQFastScan objects

- simdib.h for an abstraction above the AVX2 intrinsics

- BlockInvertedLists for invlists that are 32-byte aligned and where codes are not sequential

- pq4_fast_scan.h/.cpp:  for packing codes and look-up tables + optmized distance comptuation kernels

- simd_result_hander.h: SIMD version of result collection in heaps / reservoirs

Misc changes:

- added contrib.inspect_tools to access fields in C++ objects

- moved .h and .cpp code for inverted lists to an invlists/ subdirectory, and made a .h/.cpp for InvertedListsIOHook

- added a new inverted lists type with 32-byte aligned codes (for consumption by SIMD)

- moved Windows-specific intrinsics to platfrom_macros.h

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1542

Test Plan:
```
buck test mode/opt  -j 4  //faiss/tests/:test_fast_scan_ivf //faiss/tests/:test_fast_scan
buck test mode/opt  //faiss/manifold/...
```

Reviewed By: wickedfoo

Differential Revision: D25175439

Pulled By: mdouze

fbshipit-source-id: ad1a40c0df8c10f4b364bdec7172e43d71b56c34
2020-12-03 10:06:38 -08:00
Matthijs Douze 92306e3a69 Synthetic dataset with inner product option
Summary: The synthetic dataset can now have IP groundtruth

Reviewed By: wickedfoo

Differential Revision: D24219860

fbshipit-source-id: 42e094479311135e932821ac0a97ed0fb237bf78
2020-10-20 03:46:26 -07:00
Lucas Hosseini 70eaa9b1a3 Add missing copyright headers. (#1460)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1460

Reviewed By: wickedfoo

Differential Revision: D24278804

Pulled By: beauby

fbshipit-source-id: 5ea96ceb63be76a34f1eb4da03972159342cd5b6
2020-10-13 11:15:59 -07:00
Matthijs Douze 8b05434a50 Remove useless function
Summary:
Removed an unused function that caused compile errors in some configurations.
Added contrib function (exhaustive_search.knn) to compute the k nearest neighbors without constructing an index.
Renamed the equivalent GPU function as exhaustive_search.knn_gpu (it does not make much sense to mention numpy in the name as all functions take numpy arguments by default).

Reviewed By: beauby

Differential Revision: D24215427

fbshipit-source-id: 6d8e1eafa7c57593304b7b76f83b3015e4d2a2bb
2020-10-09 07:57:04 -07:00
Matthijs Douze 65ee09484f Test GPU ground-truth computation (#1432)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1432

The contrib function knn_ground_truth does not provide exactly the same resutls on GPU and CPU (but relative accuracy is still 1e-7). This diff relaxes the constraint on CPU and added test on GPU.

Reviewed By: wickedfoo

Differential Revision: D24012199

fbshipit-source-id: aaa20dbdf42b876b3ed7da34028646dbb20833d3
2020-09-30 11:14:18 -07:00
Matthijs Douze f849680777 Dataset access in contrib
Summary:
This diff adds an object for a few useful dataset in faiss.contrib.
This includes synthetic datasets and the classic ones.
It is intended to work on:
- the FAIR cluster
- gluster
- manifold

Reviewed By: wickedfoo

Differential Revision: D23378763

fbshipit-source-id: 2437a7be9e712fd5ad1bccbe523cc1c936f7ab35
2020-08-27 19:19:33 -07:00
Lucas Hosseini e5d2defaae Disable contrib tests for python2. (#1364)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1364

Test Plan: Imported from OSS

Reviewed By: mdouze

Differential Revision: D23314732

Pulled By: beauby

fbshipit-source-id: 788465c353bbc65947a6c766e8509f35f35e4134
2020-08-25 16:58:24 -07:00
Lucas Hosseini a8e4c5e2d5 Move build to CMake (#1313)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1313

Reviewed By: mdouze

Differential Revision: D22948267

Pulled By: beauby

fbshipit-source-id: ec16fa0342f37672d46fb7886ecc55c7996011c4
2020-08-14 15:03:10 -07:00
Lucas Hosseini cd38e82f0c
Facebook sync 2020-07-31 (#1308) 2020-08-03 22:15:02 +02:00