faiss

mirror of https://github.com/facebookresearch/faiss.git synced 2025-06-03 21:54:02 +08:00

Author	SHA1	Message	Date
Michael Norris	835b3ea1bd	Fix IVF quantizer centroid sharding so IDs are generated (#4197 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4197 Ivan and I discussed 2 problems: 1. We may want to try to offload/shard PQ or SQ table data if there is a big enough win (pending) 2. IDs seem to be random after sharding. This diff solves 2. Root cause is that we add to quantizer without IDs. Instead, we wrap in IndexIDMap2 (which provides reconstruction, whereas IndexIDMap does not). Laser's quantizers are Flat and HNSW, so we can wrap like this. Reviewed By: ivansopin Differential Revision: D69832788 fbshipit-source-id: 331b6d1cf52666f5dac61e2b52302d46b0a83708	2025-02-24 16:01:08 -08:00
Michael Norris	aff6bfcd80	Add sharding convenience function for IVF indexes (#4150 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4150 Creates a sharding convenience function for IVF indexes. - The __centroids on the quantizer__ are sharded based on the given sharding function. (not the data, as data sharding by ids is already implemented by copy_subuset_to, https://github.com/facebookresearch/faiss/blob/main/faiss/IndexIVF.h#L408) - The output is written to files based on the template filename generator param. - The default sharding function is simply the ith vector mod the total shard count. This would called by Laser here: https://www.internalfb.com/code/fbsource/[ce1f2e028e79]/fbcode/fblearner/flow/projects/laser/laser_sim_search/knn_trainer.py?lines=295-296. This convenience function will do the file writing, and return the created file names. There's a few key required changes in FAISS: 1. Allow `std::vector<std::string>` to be used. Updates swigfaiss.swig and array_conversions.py to accommodate. These have to be numpy dtype of `object` instead of the more correct `unicode`, because unicode dtype is fixed length. I couldn't figure out how to create a numpy array with each of the output file names where they have different dtypes. (Say the file names are like file1, file11, file111. The dtype would need to be U5, U6, U7 respectively, as the dtype for unicode contains the length). I tried structured arrays : this does not work either, as numpy makes it into a matrix instead: the `file1 file11 file111` example with explicit setting of U5, U6, U7 turns into `[[file1 file1 file1], [file1 file11 file11], [file1 file11 file111]]`, which we do not want. If someone knows the right syntax, please yell at me 2. Create Python callbacks for sharding and template filename: `PyCallbackFilenameTemplateGenerator` and `PyCallbackShardingFunction`. Users of this function would inherit from the FilenameTemplateGenerator or ShardingFunction in C++ to pass to `shard_ivf_index_centroids`. See the other examples in python_callbacks.cpp. This is required because Python functions cannot be passed through SWIG to C++ (i.e. no std::function or function pointers), so we have to use this approach. This approach allows it to be called from both C++ and Python. test_sharding.py shows the Python calling, test_utils.cpp shows the C++ calling. Reviewed By: asadoughi Differential Revision: D68534991 fbshipit-source-id: b857e20c6cc4249a2ab7792db4c93dd4fb8403fd	2025-02-07 11:39:59 -08:00
Michael Norris	eff0898a13	Enable linting: lint config changes plus arc lint command (#3966 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3966 This actually enables the linting. Manual changes: - tools/arcanist/lint/fbsource-licenselint-config.toml - tools/arcanist/lint/fbsource-lint-engine.toml Automated changes: `arc lint --apply-patches --take LICENSELINT --paths-cmd 'hg files faiss'` Reviewed By: asadoughi Differential Revision: D64484165 fbshipit-source-id: 4f2f6e953c94ef6ebfea8a5ae035ccfbea65ed04	2024-10-22 09:46:48 -07:00
Xiao Fu	5e452ed52a	Cleaning up more unnecessary print (#3455 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3455 Code quality control by reducing the number of prints Reviewed By: junjieqi Differential Revision: D57502194 fbshipit-source-id: a6cd65ed4cc49590ce73d2978d41b640b5259c17	2024-05-17 16:59:36 -07:00
Matthijs Douze	8fc3775472	building blocks for hybrid CPU / GPU search (#2638 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2638 This diff is a more streamlined way of searching IVF indexes with precomputed clusters. This will be used for experiments with hybrid CPU / GPU search. Reviewed By: algoriddle Differential Revision: D41301032 fbshipit-source-id: a1d645fd0f2bf806454dfd04971edc0a6200d20d	2023-01-12 13:34:44 -08:00
Check Deng	55c93f3cde	Handle the situation where nprobe > nlist in IndexBinaryIVF (#1695 ) Summary: ## Description It is the same as https://github.com/facebookresearch/faiss/pull/1673 but for `IndexBinaryIVF`. Ensure that `nprobe` is no more than `nlist`. ## Changes 1. Replace `nprobe` with `min(nprobe, nlist)` 2. Replace `long` with `idx_t` in `IndexBinaryIVF.cpp` 3. Add a unit test 4. Fix a small bug in https://github.com/facebookresearch/faiss/pull/1673, `index` should be replaced by `gt_index` Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1695 Reviewed By: wickedfoo Differential Revision: D26603278 Pulled By: mdouze fbshipit-source-id: a4fb79bdeb975e9d8ec507177596c36da1195646	2021-02-23 12:20:37 -08:00
Chengqi Deng	b4a0a9c617	Handle the situation where nprobe > nlist in IndexIVF (#1673 ) Summary: ## Description Fix the bug mentioned in https://github.com/facebookresearch/faiss/issues/1010. When `nprobe` is greater than `nlist` in `IndexIVF`, the program will crash because the index will ask the quantizer to return more centroids than it owns. ## Changes: 1. Set `nprobe` as `nlist` if it is greater than `nlist` during searching. 2. Add one test to detect this bug. 3. Fix typo in `IndexPQ.cpp`. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1673 Reviewed By: wickedfoo Differential Revision: D26454420 Pulled By: mdouze fbshipit-source-id: d1d0949e30802602e975a94ba873f9db29abd5ab	2021-02-16 09:54:23 -08:00
Matthijs Douze	5ad630635c	expose threat-safe stats (#1438 ) Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1438 This diff changes Faiss and the `combined_index.py` to propagate thread-safe stats to handler.py Reviewed By: MDSilber Differential Revision: D24082543 fbshipit-source-id: 944e6b7630daeede5eb9501b81557a6fe5afec44	2020-10-03 23:26:36 -07:00
Lucas Hosseini	22b7876ef5	Facebook sync (2020-03-10) (#1136 )	2020-03-10 14:24:07 +01:00
Lucas Hosseini	a8118acbc5	Facebook sync (May 2019) + relicense (#838 ) Changelog: - changed license: BSD+Patents -> MIT - propagates exceptions raised in sub-indexes of IndexShards and IndexReplicas - support for searching several inverted lists in parallel (parallel_mode != 0) - better support for PQ codes where nbit != 8 or 16 - IVFSpectralHash implementation: spectral hash codes inside an IVF - 6-bit per component scalar quantizer (4 and 8 bit were already supported) - combinations of inverted lists: HStackInvertedLists and VStackInvertedLists - configurable number of threads for OnDiskInvertedLists prefetching (including 0=no prefetch) - more test and demo code compatible with Python 3 (print with parentheses) - refactored benchmark code: data loading is now in a single file	2019-05-28 16:17:22 +02:00
Lucas Hosseini	323dbf3be3	Facebook sync (Dec 2018). (#660 ) * Add GpuIndexBinaryFlat * Add IndexBinaryHNSW	2018-12-19 17:48:35 +01:00

11 Commits