faiss

mirror of https://github.com/facebookresearch/faiss.git synced 2025-06-03 21:54:02 +08:00

Author	SHA1	Message	Date
Alexandr Guzhva	6a94c67a2f	QT_bf16 for scalar quantizer for bfloat16 (#3444 ) Summary: mdouze Please let me know if any additional unit tests are needed Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3444 Reviewed By: algoriddle Differential Revision: D57665641 Pulled By: mdouze fbshipit-source-id: 9bec91306a1c31ea4f1f1d726c9d60ac6415fdfc	2024-05-23 02:59:15 -07:00
Xiao Fu	5e452ed52a	Cleaning up more unnecessary print (#3455 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3455 Code quality control by reducing the number of prints Reviewed By: junjieqi Differential Revision: D57502194 fbshipit-source-id: a6cd65ed4cc49590ce73d2978d41b640b5259c17	2024-05-17 16:59:36 -07:00
Matthijs Douze	32f0e8cf92	Generalize ResultHanlder, support range search for HNSW and Fast Scan (#3190 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3190 This diff adds more result handlers in order to expose them externally. This enables range search for HSNW and Fast Scan, and nprobe parameter support for FastScan. Reviewed By: pemazare Differential Revision: D52547384 fbshipit-source-id: 271da5ffea6411df3d8e50641abade18bd7b774b	2024-01-11 11:46:30 -08:00
Maria Lomeli	c09992bc8a	Back out "Better NaN handling" (#3006 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3006 Original commit changeset: 99e7786582e9 Original Phabricator Diff: D48031390 Reviewed By: algoriddle Differential Revision: D48353221 fbshipit-source-id: fd326f2a45d20f68507ca39a33a325528651b37d	2023-08-15 09:32:01 -07:00
Matthijs Douze	a3fbf2d61c	Better NaN handling (#2986 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2986 A NaN vector is a vector with at least one NaN (not-a-number) entry. After discussion in the Faiss team we decided that: - training should throw an exception on NaN vectors - added NaN vectors should be ignored (never returned) - searched NaN vectors should return only -1s This diff implements this for a few common index types + adds relevant tests. Reviewed By: algoriddle Differential Revision: D48031390 fbshipit-source-id: 99e7786582e91950e3a53c1d8bcffdd00b6afd24	2023-08-04 06:51:06 -07:00
Matthijs Douze	b9ea339617	support range search from GPU (#2860 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2860 Optimized range search function where the GPU computes by default and falls back on gpu for queries where there are too many results. Parallelize the CPU to GPU cloning, it seems to work. Support range_search_preassigned in Python Fix long-standing issue with SWIG exposed functions that did not release the GIL (in particular the MapLong2Long). Adds a MapInt64ToInt64 that is more efficient than MapLong2Long. Reviewed By: algoriddle Differential Revision: D45672301 fbshipit-source-id: 2e77397c40083818584dbafa5427149359a2abfd	2023-05-16 00:27:53 -07:00
Alexandr Guzhva	5b172252ef	HNSW speedup + Distance 4 points (#2841 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2841 * Add virtual void DistanceComputer::distances_to_four_indices() * Add the infrastructure * HNSW::search() uses DistanceComputer::distances_to_four_indices() * Add IndexFlatL2::sync_l2norms() and IndexFlatL2::clear_l2norms() that allow to precompute L2 cache for stored vectors and compute L2 distance using dot product * Add downcasting of IndexFlatL2 and IndexFlatIP in swig * Add general-purpose prefetch utilities Reviewed By: mdouze Differential Revision: D45427064 fbshipit-source-id: d23b34fe080dbff951d34cdc1323813bd3b828e0	2023-05-05 16:13:16 -07:00
Denis Yaroshevskiy	45b16d23a1	faiss: use autovectorization for inner product (#2712 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2712 Using autovectorization to get the proper urnolling. Previous version timings are: Before ``` faiss_ip_10000 2.10us 475.62K faiss_n2_10000 4.23us 236.30K ``` After ``` faiss_ip_10000 1.21us 827.16K faiss_n2_10000 640.68ns 1.56M ``` Reviewed By: alexanderguzhva Differential Revision: D43353199 fbshipit-source-id: 8f73a34acd4b0368be6cdb05ba7a99a566c9ed83	2023-02-16 10:24:25 -08:00
Matthijs Douze	240e6dda08	Fix test timeouts (#2618 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2618 The Faiss tests run in dev mode are very slow The PQ polysemous training is particularly sensitive to this with the default settings. This diff adds a "np" suffix to two PQ factory strings to disable polysemous training. The tests that are detected as flaky because they occasionally time out. Reviewed By: alexanderguzhva Differential Revision: D41955699 fbshipit-source-id: b1e0382a0142a3ed28b498c5ea6f5499de2c1b3f	2022-12-12 09:04:43 -08:00
Matthijs Douze	9f13e43486	Building blocks for big batch IVF search Summary: Adds: - a sparse update function to the heaps - bucket sort functions - an IndexRandom index to serve as a dummy coarse quantizer for testing Reviewed By: algoriddle Differential Revision: D41804055 fbshipit-source-id: 9402b31c37c367aa8554271d8c88bc93cc1e2bda	2022-12-08 09:34:16 -08:00
Check Deng	31a8ca163e	IO support for IndexNNDescent (#2493 ) Summary: This PR added I/O methods for IndexNNDescent. Closes https://github.com/facebookresearch/faiss/issues/2125 Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2493 Test Plan: buck test //faiss/tests:test_index -- TestNNDescent Reviewed By: mlomeli1 Differential Revision: D39883381 Pulled By: mdouze fbshipit-source-id: 6fcb0c4e08e66c56750ae48ee22b0b4a958243ae	2022-09-28 06:16:11 -07:00
Check Deng	a03a1eba8b	Add IndexNSGPQ and IndexNSGSQ (#2218 ) Summary: This diff added IndexNSGPQ and IndexNSGSQ, including index factory and I/O. And also fixed the ARM CI. Fixed https://github.com/facebookresearch/faiss/issues/2128 Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2218 Reviewed By: beauby Differential Revision: D34276313 Pulled By: mdouze fbshipit-source-id: a5014af8447800ad15bd89b4f87204b4b36866d2	2022-02-18 04:51:15 -08:00
Lucas Hosseini	812e97daf4	Fix deadlock in HNSW. (#2143 ) Summary: IndexHNSW has a deadlock in the add() method, which is fixed by temporarily releasing the lock on the current element while updating its neighbors' adjacency lists. This bug concerns multi-threaded insertion only, and seems to manifest itself only with certain OpenMP configurations. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2143 Reviewed By: mdouze Differential Revision: D32919041 Pulled By: beauby fbshipit-source-id: e515541c1b22bfcb79d29c0bde1843e63f5175fb	2021-12-07 09:15:44 -08:00
Check Deng	6c99782f7c	Fix unorder bug in NSG (#2086 ) Summary: The results returned by `NSG::search` are already sorted. Calling `maxheap_reorder` will make the results unorder. Fixed https://github.com/facebookresearch/faiss/issues/2081. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2086 Test Plan: buck test //faiss/tests/:test_index -- test_order Reviewed By: beauby Differential Revision: D32593924 Pulled By: mdouze fbshipit-source-id: 794b94681610657bd2f305f7e3d6cd5d25c6bdba	2021-11-22 11:41:01 -08:00
Matthijs Douze	3eb82e32dc	Range search bug Summary: This diff fixes a serious bug in the range search implementation. During range search in a flat index, (exhaustive_L2sqr_seq and exhaustive_inner_product_seq) when running in multiple threads, the per-thread results are collected into RangeSearchPartialResult structures. When the computation is finished, they are aggregated into a RangeSearchResult. In the previous version of the code, this loop was nested into a second loop that is used to check for KeyboardInterrupts. Thus, at each iteration, the results were overwritten. The fix removes the outer loop. It is most likely useless anyways because the sequential code is called only for a small number of queries, for a larger number the BLAS version is used. Reviewed By: wickedfoo Differential Revision: D28486415 fbshipit-source-id: 89a52b17f6ca1ef68fc5e758f0e5a44d0df9fe38	2021-05-17 23:10:20 -07:00
Matthijs Douze	2d380e992b	Add manifold check for size 0 (#1867 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1867 Merging code for the 1T photodna index seems to fail at https://www.internalfb.com/phabricator/paste/view/P412975011?lines=174 with ``` terminate called after throwing an instance of 'facebook::manifold::blobstore::StorageException' what(): [400] Begin offset and/or length were invalid -- Begin offset must be positive and length must be non-negative. Received: offset = 2642410612, length = 0 Aborted (core dumped) ``` traces back to https://www.internalfb.com/intern/diffusion/FBS/browsefile/master/fbcode/manifold/blobstore/BlobstoreThriftHandler.cpp?lines=671%2C700%2C732 There is a single case where we don't check if the read or write size is 0. So let's try this fix. In the process I realized that the Manifold tests were non functional due to a name collision on common.py. Also fix this in all dependent files. Differential Revision: D28231710 fbshipit-source-id: 700ffa6ca0c82c49e7d1eae9e76549ec5ff16332	2021-05-09 22:30:31 -07:00
Chengqi Deng	c62ab3a696	Use BLAS to compute sdc table (#1809 ) Summary: This PR used BLAS to compute sdc table in ProductQuantizer. Here is the time of computing sdc tables: ``` nbits=8, d=128 (this commit) M: 2, sdc: 0.0001361370086669922s M: 4, sdc: 8.273124694824219e-05s M: 8, sdc: 7.867813110351562e-05s M: 16, sdc: 0.0001227855682373047s M: 32, sdc: 0.0001697540283203125s M: 64, sdc: 0.0007395744323730469s ``` ``` nbits=8, d=128 (master) M: 2, sdc: 0.0055773258209228516s M: 4, sdc: 0.005366802215576172s M: 8, sdc: 0.0050809383392333984s M: 16, sdc: 0.005639791488647461s M: 32, sdc: 0.006036281585693359s M: 64, sdc: 0.009720802307128906s ``` Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1809 Reviewed By: beauby Differential Revision: D27706249 Pulled By: mdouze fbshipit-source-id: 102ae0c1c157e244e40557656934062f537b74d4	2021-04-16 00:17:51 -07:00
Check Deng	c37c2fa393	Support I/O and clone for NSG (#1766 ) Summary: This PR added IO and clone support to NSG. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1766 Test Plan: buck test //faiss/tests/:test_index -- TestNSG Reviewed By: beauby Differential Revision: D27189414 Pulled By: mdouze fbshipit-source-id: c35c253d043c711d09a675f4ba5c3317b9423b5b	2021-03-23 09:18:15 -07:00
Check Deng	b35103a138	Add NSG (#1707 ) Summary: ## Description: This diff implemented Navigating Spreading-out Graph (NSG) which accepts a KNN graph as input. Here is the interface of building an NSG graph: ``` c++ void IndexNSG::build(idx_t n, const float x, idx_t knn_graph, int GK); ``` where `GK` is the nb of neighbors per node and `knn_graph[i * GK + j]` is the j-th neighbor of node i. The `add` method is not implemented yet. The unit tests could be found in `tests/test_nsg.cpp`. mdouze beauby Maybe I need some advice on how to design the interface and support python. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1707 Test Plan: buck test //faiss/tests/:test_index -- TestNSG Reviewed By: beauby Differential Revision: D26748498 Pulled By: mdouze fbshipit-source-id: 3280f705fb1b5f9c8cc5efeba63b904c3b832544	2021-03-10 15:03:00 -08:00
Dikpal Reddy	2b1194a3fa	Ensure that invalid k/nprobe search input parameters to Faiss / Faiss GPU don't crash Summary: Checking for invalid parameters (number of nearest neighbors and number of probes where applicable) in the indices and throwing. Along with unit tests. Reviewed By: wickedfoo Differential Revision: D26582467 fbshipit-source-id: e345635d2f0f44ddcecc3f3314b2b9113359a787	2021-03-03 21:17:28 -08:00
Lucas Hosseini	6d51766607	Fix unused variables in python Reviewed By: mdouze Differential Revision: D26633983 fbshipit-source-id: 32b9f95ed9647716f65b93f2713a8d5bad6abe78	2021-02-24 11:52:18 -08:00
Matthijs Douze	c5975cda72	PQ4 fast scan benchmarks (#1555 ) Summary: Code + scripts for Faiss benchmarks around the Fast scan codes. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1555 Test Plan: buck test //faiss/tests/:test_refine Reviewed By: wickedfoo Differential Revision: D25546505 Pulled By: mdouze fbshipit-source-id: 902486b7f47e36221a2671d124df8c114f25db58	2020-12-16 01:18:58 -08:00
Matthijs Douze	e1adde0d84	Faster brute force search (#1502 ) Summary: This diff streamlines the code that collects results for brute force distance computations for the L2 / IP and range search / knn search combinations. It introduces a `ResultHandler` template class that abstracts what happens with the computed distances and ids. In addition to the heap result handler and the range search result handler, it introduces a reservoir result handler that improves the search speed for large k (>=100). Benchmark results (https://fb.quip.com/y0g1ACLEqJXx#OCaACA2Gm45) show that on small datasets (10k) search is 10-50% faster (improvements are larger for small k). There is room for improvement in the reservoir implementation, whose implementation is quite naive currently, but the diff is already useful in its current form. Experiments on precomputed db vector norms for L2 distance computations were not very concluding performance-wise, so the implementation is removed from IndexFlatL2. This diff also removes IndexL2BaseShift, which was never used. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1502 Test Plan: ``` buck test //faiss/tests/:test_product_quantizer buck test //faiss/tests/:test_index -- TestIndexFlat ``` Reviewed By: wickedfoo Differential Revision: D24705464 Pulled By: mdouze fbshipit-source-id: 270e10b19f3c89ed7b607ec30549aca0ac5027fe	2020-11-04 22:16:23 -08:00
Jeff Johnson	ef6e53f8ba	Cleanup flag/data propagation for IndexShards and IndexReplicas Summary: This diff fixes https://github.com/facebookresearch/faiss/issues/1412 There were various inconsistencies in how the shard and replica wrappers updated their internal state as the sub-indices were updated. This makes the two container classes work in the same way with similar synchronization functionality. Reviewed By: beauby Differential Revision: D23974186 fbshipit-source-id: c688c0c9124f823e4239aa2ff617b007b4564859	2020-09-29 10:25:46 -07:00
Matthijs Douze	6d73c2ff69	Fix int64 for python tests in windows (#1381 ) Summary: `long` is 32 bits on windows and so is the default int type for numpy (eg. the one used for `np.arange`). This diff explicitly specifies 64-bit ints for all occurrences where it matters. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1381 Reviewed By: wickedfoo Differential Revision: D23371232 Pulled By: mdouze fbshipit-source-id: 220262cd70ee70379f83de93561a4eae71c94b04	2020-08-27 12:40:55 -07:00
Lucas Hosseini	24c4460dd2	Avoid leaking file descriptors in python tests. (#1353 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1353 Test Plan: Imported from OSS Reviewed By: mdouze Differential Revision: D23292456 Pulled By: beauby fbshipit-source-id: 44458eb16d037883ff39827accf5edddb1b1bb89	2020-08-24 06:46:52 -07:00
Lucas Hosseini	a17a631dc3	Sync 20200323. (#1157 ) * Sync 20200323. * Bump version. * Remove warning filter.	2020-03-24 14:06:48 +01:00
Lucas Hosseini	22b7876ef5	Facebook sync (2020-03-10) (#1136 )	2020-03-10 14:24:07 +01:00
Lucas Hosseini	36ddba9196	Facebook sync (2019-09-10) (#943 ) * Facebook sync (2019-09-10) * Fix depends Makefile target. * Add faiss symlink for new include directives. * Fix missing header. * Fix tests. * Fix Makefile. * Update depend. * Fix include directives spacing.	2019-09-20 18:59:10 +02:00
Lucas Hosseini	a8118acbc5	Facebook sync (May 2019) + relicense (#838 ) Changelog: - changed license: BSD+Patents -> MIT - propagates exceptions raised in sub-indexes of IndexShards and IndexReplicas - support for searching several inverted lists in parallel (parallel_mode != 0) - better support for PQ codes where nbit != 8 or 16 - IVFSpectralHash implementation: spectral hash codes inside an IVF - 6-bit per component scalar quantizer (4 and 8 bit were already supported) - combinations of inverted lists: HStackInvertedLists and VStackInvertedLists - configurable number of threads for OnDiskInvertedLists prefetching (including 0=no prefetch) - more test and demo code compatible with Python 3 (print with parentheses) - refactored benchmark code: data loading is now in a single file	2019-05-28 16:17:22 +02:00
Lucas Hosseini	afe0fdc161	Facebook sync (Mar 2019) (#756 ) Facebook sync (Mar 2019) - MatrixStats object - option to round coordinates during k-means optimization - alternative option for search in HNSW - moved stats and imbalance_factor of IndexIVF to InvertedLists object - range search for IVFScalarQuantizer - direct unit8 codec in ScalarQuantizer - renamed IndexProxy to IndexReplicas and moved to main Faiss - better support for PQ code assignment with external index - support for IMI2x16 (4B virtual centroids!) - support for k = 2048 search on GPU (instead of 1024) - most CUDA mem alloc failures throw exceptions instead of terminating on an assertion - support for renaming an ondisk invertedlists - interrupt computations with ctrl-C in python	2019-03-29 16:32:28 +01:00
Lucas Hosseini	f417a53628	Fix CI tests (#687 ) * Fix test_transfer_invlists.cpp * Fix relative imports. * Fix test_index_accuracy.py. * Use default OSX version. * Allow osx gcc6 build to fail.	2019-01-08 17:52:36 +01:00
matthijs	daf589d9d2	add bench_all_ivf	2018-12-20 05:43:36 -08:00
Lucas Hosseini	323dbf3be3	Facebook sync (Dec 2018). (#660 ) * Add GpuIndexBinaryFlat * Add IndexBinaryHNSW	2018-12-19 17:48:35 +01:00
Lucas Hosseini	76bec0b500	Facebook sync (#573 ) Features: - automatic tracking of C++ references in Python - non-intel platforms supported -- some functions optimized for ARM - override nprobe for concurrent searches - support for floating-point quantizers in binary indexes Bug fixes: - no more segfaults in python (I know it's the same as the first feature but it's important!) - fix GpuIndexIVFFlat issues for float32 with 64 / 128 dims - fix sharding of flat indexes on GPU with index_cpu_to_gpu_multiple	2018-08-30 19:38:50 +02:00
Lucas Hosseini	6880286ea0	Facebook sync (#504 ) * Facebook sync * Update swig wrappers. * Fix comment.	2018-07-06 14:12:11 +02:00
Lucas Hosseini	6e40d6689f	Move python tests back together with C++ tests. (#479 )	2018-06-04 12:20:44 +02:00
Lucas Hosseini	cf18101f6d	Refactor makefiles and add configure script (#466 ) * Refactors Makefiles and add configure script. * Give MKL higher priority in configure script. * Clean up Linux example makefile.inc. * Cleanup makefile.inc examples. * Fix python clean Makefile target. * Regen swig wrappers. * Remove useless CUDAFLAGS variable. * Fix python linking flags. * Separate compile and link phase in python makefile. * Add macro to look for swig. * Add CUDA check in configure script. * Cleanup make depend targets. * Cleanup CUDA flags. * Fix linking flags. * Fix python GPU linking. * Remove useless flags from python gpu module linking. * Add check for cuda libs. * Cleanup GPU targets. * Clean up test target. * Add cpu/gpu targets to python makefile. * Clean up tutorial Makefile. * Remove stale OS var from example makefiles. * Clean up cuda example flags.	2018-06-02 08:35:30 +02:00
Matthijs Douze	0c482e54eb	sync with FB version 2018-02-23 (#347 ) - support on-disk IVF	2018-02-23 07:49:45 -08:00
matthijs	9933892ec9	sync with FB version 2017-01-09 - adding HNSW indexing method - simultaneous search and reconstruction for IndexIVFPQ	2018-01-09 06:42:06 -08:00
matthijs	250a3d3f18	sync with FB version 2017-11-22 various bugfixes from github issues kmean with some frozen centroids GPU better tiling for large flat datasets default AVX for vector ops	2017-11-22 05:11:28 -08:00
matthijs	a5ef16db89	sync with FB version 2017-08-09	2017-08-09 11:13:51 -07:00
matthijs	8e3dc6f2b0	changed license	2017-07-30 00:18:45 -07:00
matthijs	f7aedbdfc0	sync with FB version 2017-07-18 - implemented ScalarQuantizer (without IVF) - implemented update for IndexIVFFlat - implemented L2 normalization preproc	2017-07-18 02:51:27 -07:00
matthijs	784e2facd8	Synchronization with FB version 2017-06-21 * moved most FAISS_ASSERT calls to C++ exceptions, and adjusted memory allocation to avoid mem leaks * added an IndexIVFScalarQuantizer type that offers an intermediate compression between IVFFlat and IVFPQ * support removal of indices in IndexIDMap / IndexFlat combination * various fixes in GPU code	2017-06-21 09:01:06 -07:00

45 Commits