History

chasingegg adc9d1a0cd Refactor prepare cache code in cmp_with_scann benchmark (#2573 ) Summary: In ```cmp_with_scann.py```, we will save npy file for base and query vector file and gt file. However, we will only do this while the lib is faiss, if we directly run this script with scann lib it will complain that file does not exsit. Therefore, the code should be refactored to save npy file from the beginning so that nothing will go wrong. Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2573 Reviewed By: mdouze Differential Revision: D42338435 Pulled By: algoriddle fbshipit-source-id: 9227f95e1ff79f5329f6206a0cb7ca169185fdb3		2023-01-04 02:35:18 -08:00
..
README.md	docs: Improve readability (#2378 )	2022-07-08 09:19:07 -07:00
bench_all_ivf.py	Building blocks for big batch IVF search	2022-12-08 09:34:16 -08:00
bench_kmeans.py	PQ4 fast scan benchmarks (#1555 )	2020-12-16 01:18:58 -08:00
cmp_with_scann.py	Refactor prepare cache code in cmp_with_scann benchmark (#2573 )	2023-01-04 02:35:18 -08:00
datasets_oss.py	Building blocks for big batch IVF search	2022-12-08 09:34:16 -08:00
make_groundtruth.py	PQ4 fast scan benchmarks (#1555 )	2020-12-16 01:18:58 -08:00
parse_bench_all_ivf.py	Support for additive quantizer search (#1961 )	2021-08-20 01:00:10 -07:00
run_on_cluster_generic.bash	PQ4 fast scan benchmarks (#1555 )	2020-12-16 01:18:58 -08:00

Benchmark of IVF variants

This is a benchmark of IVF index variants, looking at compression vs. speed vs. accuracy. The results are in this wiki chapter

The code is organized as:

datasets.py: code to access the datafiles, compute the ground-truth and report accuracies
bench_all_ivf.py: evaluate one type of inverted file
run_on_cluster_generic.bash: call bench_all_ivf.py for all tested types of indices. Since the number of experiments is quite large the script is structured so that the benchmark can be run on a cluster.
parse_bench_all_ivf.py: make nice tradeoff plots from all the results.

The code depends on Faiss and can use 1 to 8 GPUs to do the k-means clustering for large vocabularies.

It was run in October 2018 for the results in the wiki.