Commit Graph

21 Commits (1e4586a5a0d7ceb8a538c4878d08b0623a676a08)

Author SHA1 Message Date
Matthijs Douze c0052c1533 IndexFlatCodes: a single parent for all flat codecs (#2132)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2132

This diff adds the class IndexFlatCodes that becomes the parent of all "flat" encodings.
IndexPQ
IndexFlat
IndexAdditiveQuantizer
IndexScalarQuantizer
IndexLSH
Index2Layer

The other changes are:
- for IndexFlat, there is no vector<float> with the data anymore. It is replaced with a `get_xb()` function. This broke quite a few external codes, that this diff also attempts to fix.
- I/O functions needed to be adapted. This is done without changing the I/O format for any index.
- added a small contrib function to get the data from the IndexFlat
- the functionality has been made uniform, for example remove_ids and add are now in the parent class.

Eventually, we may support generic storage for flat indexes, similar to `InvertedLists`, eg to memmap the data, but this will again require a big change.

Reviewed By: wickedfoo

Differential Revision: D32646769

fbshipit-source-id: 04a1659173fd51b130ae45d345176b72183cae40
2021-12-07 01:31:07 -08:00
Matthijs Douze 2d380e992b Add manifold check for size 0 (#1867)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1867

Merging code for the 1T photodna index seems to fail at

https://www.internalfb.com/phabricator/paste/view/P412975011?lines=174

with
```
terminate called after throwing an instance of 'facebook::manifold::blobstore::StorageException'
  what():  [400] Begin offset and/or length were invalid -- Begin offset must be positive and length must be non-negative. Received: offset = 2642410612, length = 0
Aborted (core dumped)
```
traces back to

https://www.internalfb.com/intern/diffusion/FBS/browsefile/master/fbcode/manifold/blobstore/BlobstoreThriftHandler.cpp?lines=671%2C700%2C732

There is a single case where we don't check if the read or write size is 0. So let's try this fix.

In the process I realized that the Manifold tests were non functional due to a name collision on common.py. Also fix this in all dependent files.

Differential Revision: D28231710

fbshipit-source-id: 700ffa6ca0c82c49e7d1eae9e76549ec5ff16332
2021-05-09 22:30:31 -07:00
Matthijs Douze 28edc56fa8 Search in sharded invlists
Summary:
This diff adds a CombinedIndexSharded1T class to combined_index that uses the 30 shards from the Spark reducer.
The metadata is stored in pickle files on manifold.

Differential Revision: D24018824

fbshipit-source-id: be4ff8b38c3d6e1bb907e02b655d0e419b7a6fea
2020-10-19 10:39:22 -07:00
Matthijs Douze 6d73c2ff69 Fix int64 for python tests in windows (#1381)
Summary:
`long` is 32 bits on windows and so is the default int type for numpy (eg. the one used for `np.arange`).
This diff explicitly specifies 64-bit ints for all occurrences where it matters.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1381

Reviewed By: wickedfoo

Differential Revision: D23371232

Pulled By: mdouze

fbshipit-source-id: 220262cd70ee70379f83de93561a4eae71c94b04
2020-08-27 12:40:55 -07:00
Lucas Hosseini 7c6a446bf5 Avoid building OnDiskInvertedLists on Windows. (#1374)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1374

Test Plan: Imported from OSS

Reviewed By: mdouze

Differential Revision: D23314729

Pulled By: beauby

fbshipit-source-id: 5ad7fa3ed830b17a5be66fb2995dd94e079d8507
2020-08-25 16:58:24 -07:00
Lucas Hosseini 24c4460dd2 Avoid leaking file descriptors in python tests. (#1353)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1353

Test Plan: Imported from OSS

Reviewed By: mdouze

Differential Revision: D23292456

Pulled By: beauby

fbshipit-source-id: 44458eb16d037883ff39827accf5edddb1b1bb89
2020-08-24 06:46:52 -07:00
Lucas Hosseini 22b7876ef5
Facebook sync (2020-03-10) (#1136) 2020-03-10 14:24:07 +01:00
Lucas Hosseini 36ddba9196
Facebook sync (2019-09-10) (#943)
* Facebook sync (2019-09-10)

* Fix depends Makefile target.

* Add faiss symlink for new include directives.

* Fix missing header.

* Fix tests.

* Fix Makefile.

* Update depend.

* Fix include directives spacing.
2019-09-20 18:59:10 +02:00
Lucas Hosseini 3896b12c65
Facebook sync (Jun 2019) (#862)
Bugfixes:
- slow scanning of inverted lists (#836).

Features:
- add basic support for 6 new metrics in CPU `IndexFlat` and `IndexHNSW` (#848);
- add support for `IndexIDMap`/`IndexIDMap2` with binary indexes (#780).

Misc:
- throw python exception for OOM (#758);
- make `DistanceComputer` available for all random access indexes;
- gradually moving from `long` to `int64_t` for portability.
2019-06-19 15:59:06 +02:00
Lucas Hosseini a8118acbc5
Facebook sync (May 2019) + relicense (#838)
Changelog:

- changed license: BSD+Patents -> MIT
- propagates exceptions raised in sub-indexes of IndexShards and IndexReplicas
- support for searching several inverted lists in parallel (parallel_mode != 0)
- better support for PQ codes where nbit != 8 or 16
- IVFSpectralHash implementation: spectral hash codes inside an IVF
- 6-bit per component scalar quantizer (4 and 8 bit were already supported)
- combinations of inverted lists: HStackInvertedLists and VStackInvertedLists
- configurable number of threads for OnDiskInvertedLists prefetching (including 0=no prefetch)
- more test and demo code compatible with Python 3 (print with parentheses)
- refactored benchmark code: data loading is now in a single file
2019-05-28 16:17:22 +02:00
Lucas Hosseini afe0fdc161
Facebook sync (Mar 2019) (#756)
Facebook sync (Mar 2019)

- MatrixStats object
- option to round coordinates during k-means optimization
- alternative option for search in HNSW
- moved stats and imbalance_factor of IndexIVF to InvertedLists object
- range search for IVFScalarQuantizer
- direct unit8 codec in ScalarQuantizer
- renamed IndexProxy to IndexReplicas and moved to main Faiss
- better support for PQ code assignment with external index
- support for IMI2x16 (4B virtual centroids!)
- support for k = 2048 search on GPU (instead of 1024)
- most CUDA mem alloc failures throw exceptions instead of terminating on an assertion
- support for renaming an ondisk invertedlists
- interrupt computations with ctrl-C in python
2019-03-29 16:32:28 +01:00
Lucas Hosseini 323dbf3be3
Facebook sync (Dec 2018). (#660)
* Add GpuIndexBinaryFlat
* Add IndexBinaryHNSW
2018-12-19 17:48:35 +01:00
Lucas Hosseini 76bec0b500
Facebook sync (#573)
Features:

- automatic tracking of C++ references in Python
- non-intel platforms supported -- some functions optimized for ARM
- override nprobe for concurrent searches
- support for floating-point quantizers in binary indexes
Bug fixes:

- no more segfaults in python (I know it's the same as the first feature but it's important!)
- fix GpuIndexIVFFlat issues for float32 with 64 / 128 dims
- fix sharding of flat indexes on GPU with index_cpu_to_gpu_multiple
2018-08-30 19:38:50 +02:00
Lucas Hosseini 6880286ea0
Facebook sync (#504)
* Facebook sync

* Update swig wrappers.

* Fix comment.
2018-07-06 14:12:11 +02:00
Lucas Hosseini 6e40d6689f
Move python tests back together with C++ tests. (#479) 2018-06-04 12:20:44 +02:00
Lucas Hosseini cf18101f6d Refactor makefiles and add configure script (#466)
* Refactors Makefiles and add configure script.

* Give MKL higher priority in configure script.

* Clean up Linux example makefile.inc.

* Cleanup makefile.inc examples.

* Fix python clean Makefile target.

* Regen swig wrappers.

* Remove useless CUDAFLAGS variable.

* Fix python linking flags.

* Separate compile and link phase in python makefile.

* Add macro to look for swig.

* Add CUDA check in configure script.

* Cleanup make depend targets.

* Cleanup CUDA flags.

* Fix linking flags.

* Fix python GPU linking.

* Remove useless flags from python gpu module linking.

* Add check for cuda libs.

* Cleanup GPU targets.

* Clean up test target.

* Add cpu/gpu targets to python makefile.

* Clean up tutorial Makefile.

* Remove stale OS var from example makefiles.

* Clean up cuda example flags.
2018-06-02 08:35:30 +02:00
Ailing cd884114d0 Make tests compatible with py3 (#348) 2018-02-24 00:38:45 +01:00
Matthijs Douze 0c482e54eb sync with FB version 2018-02-23 (#347)
- support on-disk IVF
2018-02-23 07:49:45 -08:00
matthijs 250a3d3f18 sync with FB version 2017-11-22
various bugfixes from github issues
kmean with some frozen centroids
GPU better tiling for large flat datasets
default AVX for vector ops
2017-11-22 05:11:28 -08:00
matthijs 8e3dc6f2b0 changed license 2017-07-30 00:18:45 -07:00
matthijs 12f181ee44 forgotten 2017-07-18 02:55:11 -07:00