faiss/benchs/bench_all_ivf
Lucas Hosseini a8118acbc5
Facebook sync (May 2019) + relicense (#838)
Changelog:

- changed license: BSD+Patents -> MIT
- propagates exceptions raised in sub-indexes of IndexShards and IndexReplicas
- support for searching several inverted lists in parallel (parallel_mode != 0)
- better support for PQ codes where nbit != 8 or 16
- IVFSpectralHash implementation: spectral hash codes inside an IVF
- 6-bit per component scalar quantizer (4 and 8 bit were already supported)
- combinations of inverted lists: HStackInvertedLists and VStackInvertedLists
- configurable number of threads for OnDiskInvertedLists prefetching (including 0=no prefetch)
- more test and demo code compatible with Python 3 (print with parentheses)
- refactored benchmark code: data loading is now in a single file
2019-05-28 16:17:22 +02:00
..
README.md Update README.md 2018-12-20 14:52:59 +01:00
bench_all_ivf.py Facebook sync (May 2019) + relicense (#838) 2019-05-28 16:17:22 +02:00
bench_kmeans.py Facebook sync (May 2019) + relicense (#838) 2019-05-28 16:17:22 +02:00
datasets.py Facebook sync (May 2019) + relicense (#838) 2019-05-28 16:17:22 +02:00
parse_bench_all_ivf.py Facebook sync (May 2019) + relicense (#838) 2019-05-28 16:17:22 +02:00
run_on_cluster_generic.bash Facebook sync (May 2019) + relicense (#838) 2019-05-28 16:17:22 +02:00

README.md

Benchmark of IVF variants

This is a benchmark of IVF index variants, looking at compression vs. speed vs. accuracy. The results are in this wiki chapter

The code is organized as:

  • datasets.py: code to access the datafiles, compute the ground-truth and report accuracies

  • bench_all_ivf.py: evaluate one type of inverted file

  • run_on_cluster_generic.bash: call bench_all_ivf.py for all tested types of indices. Since the number of experiments is quite large the script is structued so that the benchmark can be run on a cluster.

  • parse_bench_all_ivf.py: make nice tradeoff plots from all the results.

The code depends on Faiss and can use 1 to 8 GPUs to do the k-means clustering for large vocabularies.

It was run in October 2018 for the results in the wiki.