faiss/demos/offline_ivf
Maria Lomeli 0fc8456e1d Offline IVF powered by faiss big batch search (#3202)
Summary:
This PR introduces the offline IVF (OIVF) framework which contains some tooling to run search using IVFPQ indexes (plus OPQ pretransforms) for large batches of queries using [big_batch_search](https://github.com/mlomeli1/faiss/blob/main/contrib/big_batch_search.py) and GPU faiss. See the [README](36226f5fe8/demos/offline_ivf/README.md) for details about using this framework.

This PR includes the following unit tests, which can be run with the unittest library as so:
````
~/faiss/demos/offline_ivf$ python3 -m unittest tests/test_iterate_input.py -k test_iterate_back
````
In test_offline_ivf:
````
test_consistency_check
test_train_index
test_index_shard_equal_file_sizes
test_index_shard_unequal_file_sizes
test_search
test_evaluate_without_margin
test_evaluate_without_margin_OPQ
````
In test_iterate_input:
````
test_iterate_input_file_larger_than_batch
test_get_vs_iterate
test_iterate_back

````

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3202

Reviewed By: algoriddle

Differential Revision: D52734222

Pulled By: mlomeli1

fbshipit-source-id: 61fd0084277c1b14bdae1189db8ae43340611e16
2024-01-16 05:05:15 -08:00
..
tests Offline IVF powered by faiss big batch search (#3202) 2024-01-16 05:05:15 -08:00
README.md Offline IVF powered by faiss big batch search (#3202) 2024-01-16 05:05:15 -08:00
__init__.py Offline IVF powered by faiss big batch search (#3202) 2024-01-16 05:05:15 -08:00
config_ssnpp.yaml Offline IVF powered by faiss big batch search (#3202) 2024-01-16 05:05:15 -08:00
create_sharded_ssnpp_files.py Offline IVF powered by faiss big batch search (#3202) 2024-01-16 05:05:15 -08:00
dataset.py Offline IVF powered by faiss big batch search (#3202) 2024-01-16 05:05:15 -08:00
generate_config.py Offline IVF powered by faiss big batch search (#3202) 2024-01-16 05:05:15 -08:00
offline_ivf.py Offline IVF powered by faiss big batch search (#3202) 2024-01-16 05:05:15 -08:00
run.py Offline IVF powered by faiss big batch search (#3202) 2024-01-16 05:05:15 -08:00
utils.py Offline IVF powered by faiss big batch search (#3202) 2024-01-16 05:05:15 -08:00

README.md

Offline IVF

This folder contains the code for the offline ivf algorithm powered by faiss big batch search.

Create a conda env:

conda create --name oivf python=3.10

conda activate oivf

conda install -c pytorch/label/nightly -c nvidia faiss-gpu=1.7.4

conda install tqdm

conda install pyyaml

conda install -c conda-forge submitit

Run book

  1. Optionally shard your dataset (see create_sharded_dataset.py) and create the corresponding yaml file config_ssnpp.yaml. You can use generate_config.py by specifying the root directory of your dataset and the files with the data shards

python generate_config

  1. Run the train index command

python run.py --command train_index --config config_ssnpp.yaml --xb ssnpp_1B

  1. Run the index-shard command so it produces sharded indexes, required for the search step

python run.py --command index_shard --config config_ssnpp.yaml --xb ssnpp_1B

  1. Send jobs to the cluster to run search

python run.py --command search --config config_ssnpp.yaml --xb ssnpp_1B --cluster_run --partition <PARTITION-NAME>

Remarks about the search command: it is assumed that the database vectors are the query vectors when performing the search step. a. If the query vectors are different than the database vectors, it should be passed in the xq argument b. A new dataset needs to be prepared (step 1) before passing it to the query vectors argument xq

python run.py --command search --config config_ssnpp.yaml --xb ssnpp_1B --xq <QUERIES_DATASET_NAME>

  1. We can always run the consistency-check for sanity checks!

python run.py --command consistency_check--config config_ssnpp.yaml --xb ssnpp_1B