54 Commits

Author SHA1 Message Date
Matthijs Douze
c5975cda72 PQ4 fast scan benchmarks (#1555)
Summary:
Code + scripts for Faiss benchmarks around the  Fast scan codes.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1555

Test Plan: buck test //faiss/tests/:test_refine

Reviewed By: wickedfoo

Differential Revision: D25546505

Pulled By: mdouze

fbshipit-source-id: 902486b7f47e36221a2671d124df8c114f25db58
2020-12-16 01:18:58 -08:00
Matthijs Douze
6d0bc58db6 Implementation of PQ4 search with SIMD instructions (#1542)
Summary:
IndexPQ and IndexIVFPQ implementations with AVX shuffle instructions.

The training and computing of the codes does not change wrt. the original PQ versions but the code layout is "packed" so that it can be used efficiently by the SIMD computation kernels.

The main changes are:

- new IndexPQFastScan and IndexIVFPQFastScan objects

- simdib.h for an abstraction above the AVX2 intrinsics

- BlockInvertedLists for invlists that are 32-byte aligned and where codes are not sequential

- pq4_fast_scan.h/.cpp:  for packing codes and look-up tables + optmized distance comptuation kernels

- simd_result_hander.h: SIMD version of result collection in heaps / reservoirs

Misc changes:

- added contrib.inspect_tools to access fields in C++ objects

- moved .h and .cpp code for inverted lists to an invlists/ subdirectory, and made a .h/.cpp for InvertedListsIOHook

- added a new inverted lists type with 32-byte aligned codes (for consumption by SIMD)

- moved Windows-specific intrinsics to platfrom_macros.h

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1542

Test Plan:
```
buck test mode/opt  -j 4  //faiss/tests/:test_fast_scan_ivf //faiss/tests/:test_fast_scan
buck test mode/opt  //faiss/manifold/...
```

Reviewed By: wickedfoo

Differential Revision: D25175439

Pulled By: mdouze

fbshipit-source-id: ad1a40c0df8c10f4b364bdec7172e43d71b56c34
2020-12-03 10:06:38 -08:00
Hap-Hugh
9b0029bd7e Fix Bugs in Link&Code (#1510)
Summary:
As the issue said, I patched these two bugs and the codes are working well now.

https://github.com/facebookresearch/faiss/issues/1503#issuecomment-722172257

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1510

Reviewed By: wickedfoo

Differential Revision: D24786497

Pulled By: mdouze

fbshipit-source-id: e7fc538ae2c5f20caf4cc9a3e9f369db7bf48a71
2020-11-06 10:28:15 -08:00
Matthijs Douze
e1adde0d84 Faster brute force search (#1502)
Summary:
This diff streamlines the code that collects results for brute force distance computations for the L2 / IP and range search / knn search combinations.

It introduces a `ResultHandler` template class that abstracts what happens with the computed distances and ids. In addition to the heap result handler and the range search result handler, it introduces a reservoir result handler that improves the search speed for  large k (>=100).

Benchmark results (https://fb.quip.com/y0g1ACLEqJXx#OCaACA2Gm45) show that on small datasets (10k) search is 10-50% faster (improvements are larger for small k). There is room for improvement in the reservoir implementation, whose implementation is quite naive currently, but the diff is already useful in its current form.

Experiments on precomputed db vector norms for L2 distance computations were not very concluding performance-wise, so the implementation is removed from IndexFlatL2.

This diff also removes IndexL2BaseShift, which was never used.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1502

Test Plan:
```
buck test //faiss/tests/:test_product_quantizer
buck test //faiss/tests/:test_index -- TestIndexFlat
```

Reviewed By: wickedfoo

Differential Revision: D24705464

Pulled By: mdouze

fbshipit-source-id: 270e10b19f3c89ed7b607ec30549aca0ac5027fe
2020-11-04 22:16:23 -08:00
cclauss
efa1e3f64f Use print() function in both Python 2 and Python 3 (#1443)
Summary:
Legacy __print__ statements are syntax errors in Python 3 but __print()__ function works as expected in both Python 2 and Python 3.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1443

Reviewed By: LowikC

Differential Revision: D24157415

Pulled By: mdouze

fbshipit-source-id: 4ec637aa26b61272e5337d47b7796a330ce25bad
2020-10-08 00:27:29 -07:00
Lucas Hosseini
ac74f576f7 fbshipit-source-id: 4f3cfa59471d548af93fe118d1b73d45bc648edf 2020-08-04 12:00:38 -07:00
Lucas Hosseini
cd38e82f0c
Facebook sync 2020-07-31 (#1308) 2020-08-03 22:15:02 +02:00
Lucas Hosseini
22b7876ef5
Facebook sync (2020-03-10) (#1136) 2020-03-10 14:24:07 +01:00
Lucas Hosseini
2ba6985f81 Facebook sync 20191129 (#1048)
Looks good!
2019-12-04 07:21:02 +01:00
Lucas Hosseini
36ddba9196
Facebook sync (2019-09-10) (#943)
* Facebook sync (2019-09-10)

* Fix depends Makefile target.

* Add faiss symlink for new include directives.

* Fix missing header.

* Fix tests.

* Fix Makefile.

* Update depend.

* Fix include directives spacing.
2019-09-20 18:59:10 +02:00
Matthijs Douze
f61e6228ca
Update README.md 2019-08-30 13:49:28 +02:00
Matthijs Douze
c364c2b91c
Update search_server.py 2019-08-29 15:36:20 +02:00
Matthijs Douze
983a1f6b8b
Update run_on_cluster.bash 2019-08-29 15:36:07 +02:00
Matthijs Douze
d9a01b2d5c
Update rpc.py 2019-08-29 15:35:53 +02:00
Matthijs Douze
64828d2851
Update merge_to_ondisk.py 2019-08-29 15:35:39 +02:00
Matthijs Douze
0fdd38055b
Update make_trained_index.py 2019-08-29 15:35:21 +02:00
Matthijs Douze
29e0514128
Update make_index_vslice.py 2019-08-29 15:35:07 +02:00
Matthijs Douze
87c83b9d97
Update distributed_query_demo.py 2019-08-29 15:34:50 +02:00
Matthijs Douze
a9a475b003
Update distributed_kmeans.py 2019-08-29 15:34:14 +02:00
Matthijs Douze
10ca6e20d4
Update combined_index.py 2019-08-29 15:33:50 +02:00
Matthijs Douze
8d08912453
Ondisk distributed index implementation (#930)
Adds the code for the distributed on-disk index
2019-08-29 13:44:08 +02:00
Lucas Hosseini
3896b12c65
Facebook sync (Jun 2019) (#862)
Bugfixes:
- slow scanning of inverted lists (#836).

Features:
- add basic support for 6 new metrics in CPU `IndexFlat` and `IndexHNSW` (#848);
- add support for `IndexIDMap`/`IndexIDMap2` with binary indexes (#780).

Misc:
- throw python exception for OOM (#758);
- make `DistanceComputer` available for all random access indexes;
- gradually moving from `long` to `int64_t` for portability.
2019-06-19 15:59:06 +02:00
Lucas Hosseini
a8118acbc5
Facebook sync (May 2019) + relicense (#838)
Changelog:

- changed license: BSD+Patents -> MIT
- propagates exceptions raised in sub-indexes of IndexShards and IndexReplicas
- support for searching several inverted lists in parallel (parallel_mode != 0)
- better support for PQ codes where nbit != 8 or 16
- IVFSpectralHash implementation: spectral hash codes inside an IVF
- 6-bit per component scalar quantizer (4 and 8 bit were already supported)
- combinations of inverted lists: HStackInvertedLists and VStackInvertedLists
- configurable number of threads for OnDiskInvertedLists prefetching (including 0=no prefetch)
- more test and demo code compatible with Python 3 (print with parentheses)
- refactored benchmark code: data loading is now in a single file
2019-05-28 16:17:22 +02:00
Lucas Hosseini
afe0fdc161
Facebook sync (Mar 2019) (#756)
Facebook sync (Mar 2019)

- MatrixStats object
- option to round coordinates during k-means optimization
- alternative option for search in HNSW
- moved stats and imbalance_factor of IndexIVF to InvertedLists object
- range search for IVFScalarQuantizer
- direct unit8 codec in ScalarQuantizer
- renamed IndexProxy to IndexReplicas and moved to main Faiss
- better support for PQ code assignment with external index
- support for IMI2x16 (4B virtual centroids!)
- support for k = 2048 search on GPU (instead of 1024)
- most CUDA mem alloc failures throw exceptions instead of terminating on an assertion
- support for renaming an ondisk invertedlists
- interrupt computations with ctrl-C in python
2019-03-29 16:32:28 +01:00
Matthijs Douze
702ad532db
Update README.md 2019-01-15 19:24:48 +01:00
Matthijs Douze
353b1967c2
Update README.md 2018-12-20 14:52:59 +01:00
Matthijs Douze
6a6bf40b2c
Create README.md 2018-12-20 14:45:46 +01:00
matthijs
daf589d9d2 add bench_all_ivf 2018-12-20 05:43:36 -08:00
Matthijs Douze
87721af129
Update README.md 2018-10-03 14:09:51 +02:00
Lucas Hosseini
76bec0b500
Facebook sync (#573)
Features:

- automatic tracking of C++ references in Python
- non-intel platforms supported -- some functions optimized for ARM
- override nprobe for concurrent searches
- support for floating-point quantizers in binary indexes
Bug fixes:

- no more segfaults in python (I know it's the same as the first feature but it's important!)
- fix GpuIndexIVFFlat issues for float32 with 64 / 128 dims
- fix sharding of flat indexes on GPU with index_cpu_to_gpu_multiple
2018-08-30 19:38:50 +02:00
Lucas Hosseini
6880286ea0
Facebook sync (#504)
* Facebook sync

* Update swig wrappers.

* Fix comment.
2018-07-06 14:12:11 +02:00
Matthijs Douze
e4ef2eff82 make bench work with 1 GPU 2018-06-26 09:05:42 -06:00
Matthijs Douze
ca582907ae
Update README.md 2018-04-27 12:13:38 +02:00
Matthijs Douze
b95027adb4
Update README.md 2018-04-27 11:48:13 +02:00
matthijs
a8199f068c added link&code reference code 2018-04-27 02:43:14 -07:00
Matthijs Douze
0c482e54eb sync with FB version 2018-02-23 (#347)
- support on-disk IVF
2018-02-23 07:49:45 -08:00
matthijs
9933892ec9 sync with FB version 2017-01-09
- adding HNSW indexing method

- simultaneous search and reconstruction for IndexIVFPQ
2018-01-09 06:42:06 -08:00
matthijs
250a3d3f18 sync with FB version 2017-11-22
various bugfixes from github issues
kmean with some frozen centroids
GPU better tiling for large flat datasets
default AVX for vector ops
2017-11-22 05:11:28 -08:00
Matthijs Douze
885399767d
set verbose of GpuMultipleClonerOptions to True 2017-11-13 10:08:52 +01:00
matthijs
8e3dc6f2b0 changed license 2017-07-30 00:18:45 -07:00
matthijs
784e2facd8 Synchronization with FB version 2017-06-21
* moved most FAISS_ASSERT calls to C++ exceptions, and adjusted
  memory allocation to avoid mem leaks

* added an IndexIVFScalarQuantizer type that offers an
  intermediate compression between IVFFlat and IVFPQ

* support removal of indices in IndexIDMap / IndexFlat combination

* various fixes in GPU code
2017-06-21 09:01:06 -07:00
matthijs
c507707098 sync with FB version. Added:
- better selection of training sets for PQ and preprocessing
- GPU parameter object
- IndexIDMap fixed
- fixed redo bug in clustering
2017-03-20 10:48:35 -07:00
mdouze
ab50f0ae4b Update README.md 2017-03-01 11:09:41 +01:00
mdouze
12049daee2 Update README.md 2017-02-28 10:24:01 +01:00
mdouze
da8b35da5f Update README.md 2017-02-27 10:06:19 +01:00
mdouze
caa7787e17 Update README.md 2017-02-25 08:17:11 +01:00
mdouze
990c96343a Update README.md 2017-02-24 22:44:15 +01:00
mdouze
8c03bf4e32 Update README.md 2017-02-24 18:15:31 +01:00
mdouze
455c6e1015 Update README.md 2017-02-24 18:05:15 +01:00
mdouze
4a8a4cd5a5 Update README.md 2017-02-24 17:42:58 +01:00