faiss

Commit Graph

Author	SHA1	Message	Date
spectaclehong	b13f47a4da	Fix reconstruct bug when by_residual is false (#2298 ) Summary: When I reconstruct with by_residual turned off, the distance was greatly increased. This is because the reconstruct_from_offset function did not check if the by_residual option was off. I fix this bug with simple if statement. (like this https://github.com/facebookresearch/faiss/blob/main/faiss/IndexIVFPQ.cpp#L365) Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2298 Reviewed By: alexanderguzhva Differential Revision: D35746566 Pulled By: mdouze fbshipit-source-id: 50f98c7cc97c7936507573fe41b65a79ecdbc4ca	2022-04-20 01:35:21 -07:00
Matthijs Douze	291353c5a9	Generalize DistanceComputer for flat indexes (#2255 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2255 The `DistanceComputer` object is derived from an Index (obtained with `get_distance_computer()`). It maintains a current query and quickly computes distances from that query to any item in the database. This is useful, eg. for the IndexHNSW and IndexNSG that rely on query-to-point comparisons in the datasets. This diff introduces the `FlatCodesDistanceComputer`, that inherits from `DistanceComputer` for Flat indexes. In addition to the distance-to-item function, it adds a `distance_to_code` that computes the distance from any code to the current query, even if it is not stored in the index. This is implemented for all FlatCode indexes (IndexFlat, IndexPQ, IndexScalarQuantizer and IndexAdditiveQuantizer). In the process, the two classes were extracted to their own header file `impl/DistanceComputer.h` Reviewed By: beauby Differential Revision: D34863609 fbshipit-source-id: 39d8c66475e55c3223c4a6a210827aa48bca292d	2022-03-20 23:43:33 -07:00
Matthijs Douze	c0052c1533	IndexFlatCodes: a single parent for all flat codecs (#2132 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2132 This diff adds the class IndexFlatCodes that becomes the parent of all "flat" encodings. IndexPQ IndexFlat IndexAdditiveQuantizer IndexScalarQuantizer IndexLSH Index2Layer The other changes are: - for IndexFlat, there is no vector<float> with the data anymore. It is replaced with a `get_xb()` function. This broke quite a few external codes, that this diff also attempts to fix. - I/O functions needed to be adapted. This is done without changing the I/O format for any index. - added a small contrib function to get the data from the IndexFlat - the functionality has been made uniform, for example remove_ids and add are now in the parent class. Eventually, we may support generic storage for flat indexes, similar to `InvertedLists`, eg to memmap the data, but this will again require a big change. Reviewed By: wickedfoo Differential Revision: D32646769 fbshipit-source-id: 04a1659173fd51b130ae45d345176b72183cae40	2021-12-07 01:31:07 -08:00
Matthijs Douze	b598f5558c	Add an epsilon to avoid numerical instability in PCA whiteneing Summary: PCA whitening implies to multiply eigenvectors with 1/sqrt(singular values of convariance matrix). The singular values are sometimes 0 (because the vector subspace is not full-rank) or negative (because of numerical issues). Therefore, this diff adds an epsilon to the denominator above (default 0). Reviewed By: edpizzi Differential Revision: D31725075 fbshipit-source-id: dae68bda9f7452220785d76e30ce4b2ac8582413	2021-10-19 01:02:21 -07:00
Matthijs Douze	1829aa92a1	three small fixes (#1972 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1972 This fixes a few issues that I ran into + adds tests: - range_search_max_results with IP search - a few missing downcasts for VectorTRansforms - ResultHeap supports max IP search Reviewed By: wickedfoo Differential Revision: D29525093 fbshipit-source-id: d4ff0aff1d83af9717ff1aaa2fe3cda7b53019a3	2021-07-01 16:08:45 -07:00
Matthijs Douze	2d380e992b	Add manifold check for size 0 (#1867 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1867 Merging code for the 1T photodna index seems to fail at https://www.internalfb.com/phabricator/paste/view/P412975011?lines=174 with ``` terminate called after throwing an instance of 'facebook::manifold::blobstore::StorageException' what(): [400] Begin offset and/or length were invalid -- Begin offset must be positive and length must be non-negative. Received: offset = 2642410612, length = 0 Aborted (core dumped) ``` traces back to https://www.internalfb.com/intern/diffusion/FBS/browsefile/master/fbcode/manifold/blobstore/BlobstoreThriftHandler.cpp?lines=671%2C700%2C732 There is a single case where we don't check if the read or write size is 0. So let's try this fix. In the process I realized that the Manifold tests were non functional due to a name collision on common.py. Also fix this in all dependent files. Differential Revision: D28231710 fbshipit-source-id: 700ffa6ca0c82c49e7d1eae9e76549ec5ff16332	2021-05-09 22:30:31 -07:00
Matthijs Douze	7559cf5c5b	add ResidualQuantizer Summary: This diff includes: - progressive dimension k-means. - the ResidualQuantizer object - GpuProgressiveDimIndexFactory so that it can be trained on GPU - corresponding tests - reference Python implementation of the same in scripts/matthijs/LCC_encoding Reviewed By: wickedfoo Differential Revision: D27608029 fbshipit-source-id: 9a8cf3310c8439a93641961ca8b042941f0f4249	2021-04-14 13:11:54 -07:00
H. Vetinari	9c58ae00f1	Portable SWIG Vectors (#1742 ) Summary: After initial positive feedback to the idea in https://github.com/facebookresearch/faiss/issues/1741 from mdouze, here are the patches I currently have as a basis for discussion. Matthijs suggests to not bother with the deprecation warnings at all, which is fine for me as well, though I would normally still advocate to provide users with _some_ advance notice before removing parts of an interface. Fixes https://github.com/facebookresearch/faiss/issues/1741 PS. The deprecation warning is only shown once per session (per class) PPS. I have tested in https://github.com/conda-forge/faiss-split-feedstock/pull/32 that the respective classes remain available both through `import faiss` and `from faiss import *`. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1742 Reviewed By: mdouze Differential Revision: D26978886 Pulled By: beauby fbshipit-source-id: b52e2b5b5b0117af7cd95ef5df3128e9914633ad	2021-04-02 07:11:47 -07:00
Check Deng	d6535a3d87	Add NNDescent to faiss (#1654 ) Summary: As discussed in https://github.com/facebookresearch/faiss/issues/685, I'm going to add an NSG index to faiss. This PR which adds an NNDescent index is the first step as I commented [here ](https://github.com/facebookresearch/faiss/issues/685#issuecomment-760608431). Changes: 1. Add an `IndexNNDescent` and an `IndexNNDescentFlat` which allow users to construct a KNN graph on a million scale dataset using CPU and search NN on it. The implementation part is put under `faiss/impl`. 2. Add compilation entries to `CMakeLists.txt` for C++ and `swigfaiss.swig` for Python. `IndexNNDescentFlat` could be directly called by users in C++ and Python. 3. `VisitedTable` struct in `HNSW.h` is moved into `AuxIndexStructures.h`. 3. Add a demo `demo_nndescent.cpp` to demonstrate the effectiveness. TODO 1. Support index factor. 2. Implement `IndexNNDescentPQ` and `IndexNNDescentSQ` 3. More comments in the code. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1654 Test Plan: buck test //faiss/tests/:test_index_accuracy -- TestNNDescent buck test //faiss/tests/:test_build_blocks -- TestNNDescentKNNG Reviewed By: wickedfoo Differential Revision: D26309716 Pulled By: mdouze fbshipit-source-id: 2abade9708d29023f8bccbf77143e8eea14f66c4	2021-02-25 16:48:28 -08:00
Matthijs Douze	6d0bc58db6	Implementation of PQ4 search with SIMD instructions (#1542 ) Summary: IndexPQ and IndexIVFPQ implementations with AVX shuffle instructions. The training and computing of the codes does not change wrt. the original PQ versions but the code layout is "packed" so that it can be used efficiently by the SIMD computation kernels. The main changes are: - new IndexPQFastScan and IndexIVFPQFastScan objects - simdib.h for an abstraction above the AVX2 intrinsics - BlockInvertedLists for invlists that are 32-byte aligned and where codes are not sequential - pq4_fast_scan.h/.cpp: for packing codes and look-up tables + optmized distance comptuation kernels - simd_result_hander.h: SIMD version of result collection in heaps / reservoirs Misc changes: - added contrib.inspect_tools to access fields in C++ objects - moved .h and .cpp code for inverted lists to an invlists/ subdirectory, and made a .h/.cpp for InvertedListsIOHook - added a new inverted lists type with 32-byte aligned codes (for consumption by SIMD) - moved Windows-specific intrinsics to platfrom_macros.h Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1542 Test Plan: ``` buck test mode/opt -j 4 //faiss/tests/:test_fast_scan_ivf //faiss/tests/:test_fast_scan buck test mode/opt //faiss/manifold/... ``` Reviewed By: wickedfoo Differential Revision: D25175439 Pulled By: mdouze fbshipit-source-id: ad1a40c0df8c10f4b364bdec7172e43d71b56c34	2020-12-03 10:06:38 -08:00
Matthijs Douze	25adab7425	fix 64-bit arrays on the Mac (#1531 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1531 vector_to_array assumes that long is 64 bit. Fix this and test it. Reviewed By: wickedfoo Differential Revision: D25022363 fbshipit-source-id: f51f723d590d71ee5ef39e3f86ef69426df833fa	2020-11-17 09:00:06 -08:00
Matthijs Douze	e1adde0d84	Faster brute force search (#1502 ) Summary: This diff streamlines the code that collects results for brute force distance computations for the L2 / IP and range search / knn search combinations. It introduces a `ResultHandler` template class that abstracts what happens with the computed distances and ids. In addition to the heap result handler and the range search result handler, it introduces a reservoir result handler that improves the search speed for large k (>=100). Benchmark results (https://fb.quip.com/y0g1ACLEqJXx#OCaACA2Gm45) show that on small datasets (10k) search is 10-50% faster (improvements are larger for small k). There is room for improvement in the reservoir implementation, whose implementation is quite naive currently, but the diff is already useful in its current form. Experiments on precomputed db vector norms for L2 distance computations were not very concluding performance-wise, so the implementation is removed from IndexFlatL2. This diff also removes IndexL2BaseShift, which was never used. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1502 Test Plan: ``` buck test //faiss/tests/:test_product_quantizer buck test //faiss/tests/:test_index -- TestIndexFlat ``` Reviewed By: wickedfoo Differential Revision: D24705464 Pulled By: mdouze fbshipit-source-id: 270e10b19f3c89ed7b607ec30549aca0ac5027fe	2020-11-04 22:16:23 -08:00
Matthijs Douze	698a4592e8	fix clustering objective for inner product use cases Summary: When an INNER_PRODUCT index is used for clustering, higher objective is better, so when redoing clusterings the highest objective should be retained (not the lowest). This diff fixes this and adds a test. Reviewed By: wickedfoo Differential Revision: D24701894 fbshipit-source-id: b9ec224cf8f4ffdfd2b8540ce37da43386a27b7a	2020-11-03 09:44:09 -08:00
Matthijs Douze	c97f890651	make sure swig_ptr and rev_swig_ptr work on all primitive types (#1382 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1382 The array types supported for swig_ptr were not complete. This diff fixes that. Reviewed By: wickedfoo Differential Revision: D23411297 fbshipit-source-id: d94249b140aeb8c8179d9da3fcbc97eb034eac91	2020-08-28 23:43:52 -07:00
Matthijs Douze	6d73c2ff69	Fix int64 for python tests in windows (#1381 ) Summary: `long` is 32 bits on windows and so is the default int type for numpy (eg. the one used for `np.arange`). This diff explicitly specifies 64-bit ints for all occurrences where it matters. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1381 Reviewed By: wickedfoo Differential Revision: D23371232 Pulled By: mdouze fbshipit-source-id: 220262cd70ee70379f83de93561a4eae71c94b04	2020-08-27 12:40:55 -07:00
Lucas Hosseini	b539a73e58	Linter auto-fix. Summary: `arc lint faiss/*/` Reviewed By: LowikC Differential Revision: D22891305 fbshipit-source-id: 45bab7294ccccf70898b4967b03683894b6ae4c4	2020-08-16 19:52:27 -07:00
Matthijs Douze	4c3b5ad156	Add missing downcast (#1330 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1330 add missing downcast for Index (and test for github sync) See github issue https://github.com/facebookresearch/faiss/issues/1278 Reviewed By: beauby Differential Revision: D23053182 fbshipit-source-id: 1ce4c248342332ce632ecfd5074affa3ddf55b66	2020-08-13 16:37:58 -07:00
Lucas Hosseini	cd38e82f0c	Facebook sync 2020-07-31 (#1308 )	2020-08-03 22:15:02 +02:00
Lucas Hosseini	22b7876ef5	Facebook sync (2020-03-10) (#1136 )	2020-03-10 14:24:07 +01:00
Lucas Hosseini	36ddba9196	Facebook sync (2019-09-10) (#943 ) * Facebook sync (2019-09-10) * Fix depends Makefile target. * Add faiss symlink for new include directives. * Fix missing header. * Fix tests. * Fix Makefile. * Update depend. * Fix include directives spacing.	2019-09-20 18:59:10 +02:00
Lucas Hosseini	3896b12c65	Facebook sync (Jun 2019) (#862 ) Bugfixes: - slow scanning of inverted lists (#836). Features: - add basic support for 6 new metrics in CPU `IndexFlat` and `IndexHNSW` (#848); - add support for `IndexIDMap`/`IndexIDMap2` with binary indexes (#780). Misc: - throw python exception for OOM (#758); - make `DistanceComputer` available for all random access indexes; - gradually moving from `long` to `int64_t` for portability.	2019-06-19 15:59:06 +02:00
Lucas Hosseini	a8118acbc5	Facebook sync (May 2019) + relicense (#838 ) Changelog: - changed license: BSD+Patents -> MIT - propagates exceptions raised in sub-indexes of IndexShards and IndexReplicas - support for searching several inverted lists in parallel (parallel_mode != 0) - better support for PQ codes where nbit != 8 or 16 - IVFSpectralHash implementation: spectral hash codes inside an IVF - 6-bit per component scalar quantizer (4 and 8 bit were already supported) - combinations of inverted lists: HStackInvertedLists and VStackInvertedLists - configurable number of threads for OnDiskInvertedLists prefetching (including 0=no prefetch) - more test and demo code compatible with Python 3 (print with parentheses) - refactored benchmark code: data loading is now in a single file	2019-05-28 16:17:22 +02:00
Lucas Hosseini	afe0fdc161	Facebook sync (Mar 2019) (#756 ) Facebook sync (Mar 2019) - MatrixStats object - option to round coordinates during k-means optimization - alternative option for search in HNSW - moved stats and imbalance_factor of IndexIVF to InvertedLists object - range search for IVFScalarQuantizer - direct unit8 codec in ScalarQuantizer - renamed IndexProxy to IndexReplicas and moved to main Faiss - better support for PQ code assignment with external index - support for IMI2x16 (4B virtual centroids!) - support for k = 2048 search on GPU (instead of 1024) - most CUDA mem alloc failures throw exceptions instead of terminating on an assertion - support for renaming an ondisk invertedlists - interrupt computations with ctrl-C in python	2019-03-29 16:32:28 +01:00
Lucas Hosseini	323dbf3be3	Facebook sync (Dec 2018). (#660 ) * Add GpuIndexBinaryFlat * Add IndexBinaryHNSW	2018-12-19 17:48:35 +01:00
Lucas Hosseini	76bec0b500	Facebook sync (#573 ) Features: - automatic tracking of C++ references in Python - non-intel platforms supported -- some functions optimized for ARM - override nprobe for concurrent searches - support for floating-point quantizers in binary indexes Bug fixes: - no more segfaults in python (I know it's the same as the first feature but it's important!) - fix GpuIndexIVFFlat issues for float32 with 64 / 128 dims - fix sharding of flat indexes on GPU with index_cpu_to_gpu_multiple	2018-08-30 19:38:50 +02:00
Lucas Hosseini	6e40d6689f	Move python tests back together with C++ tests. (#479 )	2018-06-04 12:20:44 +02:00
Lucas Hosseini	cf18101f6d	Refactor makefiles and add configure script (#466 ) * Refactors Makefiles and add configure script. * Give MKL higher priority in configure script. * Clean up Linux example makefile.inc. * Cleanup makefile.inc examples. * Fix python clean Makefile target. * Regen swig wrappers. * Remove useless CUDAFLAGS variable. * Fix python linking flags. * Separate compile and link phase in python makefile. * Add macro to look for swig. * Add CUDA check in configure script. * Cleanup make depend targets. * Cleanup CUDA flags. * Fix linking flags. * Fix python GPU linking. * Remove useless flags from python gpu module linking. * Add check for cuda libs. * Cleanup GPU targets. * Clean up test target. * Add cpu/gpu targets to python makefile. * Clean up tutorial Makefile. * Remove stale OS var from example makefiles. * Clean up cuda example flags.	2018-06-02 08:35:30 +02:00
Ailing	cd884114d0	Make tests compatible with py3 (#348 )	2018-02-24 00:38:45 +01:00
matthijs	9933892ec9	sync with FB version 2017-01-09 - adding HNSW indexing method - simultaneous search and reconstruction for IndexIVFPQ	2018-01-09 06:42:06 -08:00
Soumith Chintala	de8ac33f2e	fix test syntax (#260 )	2017-11-23 10:40:06 +01:00
matthijs	250a3d3f18	sync with FB version 2017-11-22 various bugfixes from github issues kmean with some frozen centroids GPU better tiling for large flat datasets default AVX for vector ops	2017-11-22 05:11:28 -08:00
matthijs	8e3dc6f2b0	changed license	2017-07-30 00:18:45 -07:00
matthijs	f7aedbdfc0	sync with FB version 2017-07-18 - implemented ScalarQuantizer (without IVF) - implemented update for IndexIVFFlat - implemented L2 normalization preproc	2017-07-18 02:51:27 -07:00
matthijs	784e2facd8	Synchronization with FB version 2017-06-21 * moved most FAISS_ASSERT calls to C++ exceptions, and adjusted memory allocation to avoid mem leaks * added an IndexIVFScalarQuantizer type that offers an intermediate compression between IVFFlat and IVFPQ * support removal of indices in IndexIDMap / IndexFlat combination * various fixes in GPU code	2017-06-21 09:01:06 -07:00

34 Commits (6539cc53393fd0d22f37f3864f580b931b374e65)