547 Commits

Author SHA1 Message Date
Matthijs Douze
a7d62b39b4 Fix GPU nighties test (#1901)
Summary:
This should fix the GPU nighties.

The rationale for the cp is that there is a shared file between the CPU and GPU tests.

Ideally, this file should probably moved to contrib at some point.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1901

Reviewed By: beauby

Differential Revision: D28680898

Pulled By: mdouze

fbshipit-source-id: b9d0e1969103764ecb6f1e047c9ed4bd4a76aaba
2021-05-26 09:41:31 -07:00
Matthijs Douze
8eab15eca3 LUT based search for additive quantizers (#1908)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1908

To search the best combination of codebooks, the method that was implemented so far is via a beam search.

It is possible to make this faster for a query vector q by precomputing look-up tables in the form of

LUT_m = <q, cent_m>

where cent_m is the set of centroids for quantizer m=0..M-1.

The LUT can then be used as

inner_prod = sum_m LUT_m[c_m]

and

L2_distance = norm_q + norm_db - 2 * inner_prod

This diff implements this computation by:

- adding the LUT precomputation

- storing an exhaustive table of all centroid norms (when using L2)

This is only practical for small additive quantizers, eg. when a residual vector quantizer is used as coarse quantizer (ResidualCoarseQuantizer).

This diff is based on AdditiveQuantizer diff because it applies equally to other quantizers (eg. the LSQ).

Reviewed By: sc268

Differential Revision: D28467746

fbshipit-source-id: 82611fe1e4908c290204d4de866338c622ae4148
2021-05-25 01:54:53 -07:00
Sugosh Nagavara Ravindra
0825eaf8d3 Include IndexResidual in clone_index
Summary:
Moving index from cpu to gpu is failing with error message `RuntimeError: Error in virtual faiss::Index *faiss::Cloner::clone_Index(const faiss::Index *) at faiss/clone_index.cpp:144: clone not supported for this type of Index`
This diff support IndexResidual clone and unblocks gpu training

Reviewed By: sc268, mdouze

Differential Revision: D28614996

fbshipit-source-id: 9b1e5e7c5dd5da6d55f02594b062691565a86f49
2021-05-24 20:05:23 -07:00
Giuseppe Ottaviano
d5a1bf3e9b Expose query_to_code in SQDistanceComputer
Summary: This is necessary to share a `SQDistanceComputer` instance among multiple thread, when the codes are not stored in a faiss index. The function is `const` and thread-safe.

Reviewed By: philippv, mdouze

Differential Revision: D28623897

fbshipit-source-id: e527d98231bf690dc01191dcc597ee800b5e57a9
2021-05-24 11:01:01 -07:00
Lucas Hosseini
cd6909004f Add packages for CUDA 11.3. (#1902)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1902

Reviewed By: mdouze

Differential Revision: D28566993

Pulled By: beauby

fbshipit-source-id: f560130c874bad355377b88b4519519af1e5d9f1
2021-05-21 07:47:37 -07:00
Chengqi Deng
c087f87730 Add LocalSearchQuantizer (#1906)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1906

This PR implemented LSQ/LSQ++, a vector quantization technique described in the following two papers:

1. Revisiting additive quantization
2. LSQ++: Lower running time and higher recall in multi-codebook quantization

Here is a benchmark running on SIFT1M for 64 bits encoding:
```
===== lsq:
        mean square error = 17335.390208
        training time: 312.729779958725 s
        encoding time: 244.6277096271515 s
===== pq:
        mean square error = 23743.004672
        training time: 1.1610801219940186 s
        encoding time: 2.636141061782837 s
===== rq:
        mean square error = 20999.737344
        training time: 31.813055515289307 s
        encoding time: 307.51959800720215 s
```

Changes:

1. Add LocalSearchQuantizer object
2. Fix an out of memory bug in ResidualQuantizer
3. Add a benchmark for evaluating quantizers
4. Add tests for LocalSearchQuantizer

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1862

Test Plan:
```
buck test //faiss/tests/:test_lsq

buck run mode/opt //faiss/benchs/:bench_quantizer -- lsq pq rq
```

Reviewed By: beauby

Differential Revision: D28376369

Pulled By: mdouze

fbshipit-source-id: 2a394d38bf75b9de0a1c2cd6faddf7dd362a6fa8
2021-05-21 01:33:55 -07:00
Chengqi Deng
d7f2dff589 Add tests for avx2 building (#1905)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1905

This PR added some tests to make sure the building with AVX2 works as we expected in Linux.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1792

Test Plan: buck test //faiss/tests/:test_fast_scan -- test_PQ4_speed

Reviewed By: beauby

Differential Revision: D27435796

Pulled By: mdouze

fbshipit-source-id: 901a1d0abd9cb45ccef541bd7a570eb2bd8aac5b
2021-05-20 22:16:06 -07:00
Y.Imaizumi
e52f5d81f8 Workaround for vshl/vshr on aarch64 GCC (#1882)
Summary:
related: https://github.com/facebookresearch/faiss/issues/1815,  https://github.com/facebookresearch/faiss/issues/1880

`vshl` / `vshr` of ARM NEON requires immediate (compiletime constant) value as shift parameter.
However, the implementations of those intrinsics on GCC can receive runtime value.
Current faiss implementation depends on this, so some correct-behavioring compilers like Clang can't build faiss for aarch64.
This PR fix this issue; thus faiss applied this PR can be built with Clang for aarch64 machines like M1 Mac.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1882

Reviewed By: beauby

Differential Revision: D28465563

Pulled By: mdouze

fbshipit-source-id: e431dfb3b27c9728072f50b4bf9445a3f4a5ac43
2021-05-20 14:55:37 -07:00
Lucas Hosseini
ef33daae92 Add CUDA compute capability 8.6 for CUDA 11 packages. (#1899)
Summary:
Also remove support for deprecated compute capabilities 3.5 and 5.2 in
CUDA 11.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1899

Reviewed By: mdouze

Differential Revision: D28539826

Pulled By: beauby

fbshipit-source-id: 6e8265f2bfd991ff3d14a6a5f76f9087271f3f75
2021-05-19 12:58:50 -07:00
Lucas Hosseini
1223e68688 Avoid OOM in Linux CPU CI jobs. (#1900)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1900

Reviewed By: mdouze

Differential Revision: D28539987

Pulled By: beauby

fbshipit-source-id: 2e44755e48bd45233578ce0ba75836fc533afe35
2021-05-19 12:36:05 -07:00
Lucas Hosseini
797bc88566 Add missing includes for std::min/std::max. (#1895)
Summary:
Closes https://github.com/facebookresearch/faiss/issues/1876.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1895

Reviewed By: mdouze

Differential Revision: D28511225

Pulled By: beauby

fbshipit-source-id: 6dc6d0662983fdac7eef516f41fea1368195fb3e
2021-05-18 10:42:09 -07:00
Lucas Hosseini
77825c52bd Fix conda recipes. (#1894)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1894

Reviewed By: wickedfoo

Differential Revision: D28510244

Pulled By: beauby

fbshipit-source-id: 32983b7eeab497b8d576caaadd56e13a2134a4ab
2021-05-18 09:10:54 -07:00
vorj
67a8070d7d Suppress -Wpedantic (#1888)
Summary:
Current `faiss` contains some codes which will be warned by compilers when we will add some compile options like `-Wall -Wextra` .
IMHO, warning codes in `.cpp` and `.cu` doesn't need to be fixed if the policy of this project allows warning.
However, I think that it is better to fix the codes in `.h` and `.cuh` , which are also referenced by `faiss` users.

Currently it makes a error to `#include` some faiss headers like `#include<faiss/IndexHNSW.h>` when we compile the codes with `-pedantic-errors` .
This PR fix this problem.
In this PR, for the reasons above, we fixed `-Wpedantic` codes only in `.h` .

This PR doesn't change `faiss` behavior.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1888

Reviewed By: wickedfoo

Differential Revision: D28506963

Pulled By: beauby

fbshipit-source-id: cbdf0506a95890c9c1b829cb89ee60e69cf94a79
2021-05-18 08:49:50 -07:00
Matthijs Douze
3eb82e32dc Range search bug
Summary:
This diff fixes a serious bug in the range search implementation.

During range search in a flat index, (exhaustive_L2sqr_seq and exhaustive_inner_product_seq) when running in multiple threads, the per-thread results are collected into RangeSearchPartialResult structures.

When the computation is finished, they are aggregated into a RangeSearchResult. In the previous version of the code, this loop was nested into a second loop that is used to check for KeyboardInterrupts. Thus, at each iteration, the results were overwritten.

The fix removes the outer loop. It is most likely useless anyways because the sequential code is called only for a small number of queries, for a larger number the BLAS version is used.

Reviewed By: wickedfoo

Differential Revision: D28486415

fbshipit-source-id: 89a52b17f6ca1ef68fc5e758f0e5a44d0df9fe38
2021-05-17 23:10:20 -07:00
Alexander Andreev
a87930111e Classes inherited from VectorTransform for c_api (#1869)
Summary:
inherite from  LinearTransform doesn't work here.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1869

Reviewed By: beauby

Differential Revision: D28345866

Pulled By: mdouze

fbshipit-source-id: 277dd421213a91c07ed41d7b23002840cc5cfa1f
2021-05-12 07:34:39 -07:00
Y.Imaizumi
d7f9c0ce95 Implement peel loop for fvec_L2sqr, fvec_inner_product, and fvec_norm_L2sqr on aarch64 (#1878)
Summary:
In the current `faiss` implementation for x86, `fvec_L2sqr` , `fvec_inner_product` , and `fvec_norm_L2sqr` are [optimized for any dimensionality](e86bf8cae1/faiss/utils/distances_simd.cpp (L404-L432)).

On the other hand, the functions for aarch64 are optimized [**only** if `d` is multiple for 4](e86bf8cae1/faiss/utils/distances_simd.cpp (L583-L584)); thus, they are not much fast for vectors with `d % 4 != 0` .
This PR has accelerated the above three functions for any input size on aarch64.

Kind regards,

![peel-loop](https://user-images.githubusercontent.com/40021161/117810705-b36e7380-b29a-11eb-9096-91babc27a03d.png)
- Evaluated on an AWS EC2 ARM instance (c6g.4xlarge)
- sift1m127 is the dataset with dropped trailing elements of sift1m
    - Therefore, the vector length of sift1m127 is 127 that is not multiple of 4
    - "optimized" runs 2.45-2.77 times faster than "original" with sift1m127
- Two methods, "original" and "optimized", are expected to achieve the same level of performance for sift1m
    - And actually there is almost no significant difference

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1878

Reviewed By: beauby

Differential Revision: D28376329

Pulled By: mdouze

fbshipit-source-id: c68f13b4c426e56681d81efd8a27bd7bead819de
2021-05-12 07:15:56 -07:00
CodemodService FBSourceClangFormatLinterBot
b4c320a671 Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D28319469

fbshipit-source-id: 8295597a8ee16b2fef3f7aacdd6c892cb22db988
2021-05-10 03:38:52 -07:00
Matthijs Douze
2d380e992b Add manifold check for size 0 (#1867)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1867

Merging code for the 1T photodna index seems to fail at

https://www.internalfb.com/phabricator/paste/view/P412975011?lines=174

with
```
terminate called after throwing an instance of 'facebook::manifold::blobstore::StorageException'
  what():  [400] Begin offset and/or length were invalid -- Begin offset must be positive and length must be non-negative. Received: offset = 2642410612, length = 0
Aborted (core dumped)
```
traces back to

https://www.internalfb.com/intern/diffusion/FBS/browsefile/master/fbcode/manifold/blobstore/BlobstoreThriftHandler.cpp?lines=671%2C700%2C732

There is a single case where we don't check if the read or write size is 0. So let's try this fix.

In the process I realized that the Manifold tests were non functional due to a name collision on common.py. Also fix this in all dependent files.

Differential Revision: D28231710

fbshipit-source-id: 700ffa6ca0c82c49e7d1eae9e76549ec5ff16332
2021-05-09 22:30:31 -07:00
Matthijs Douze
441ccebbff Make more Residual quantizer more memory efficient (#1865)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1865

This diff chunks vectors to encode to make it more memory efficient.

Reviewed By: sc268

Differential Revision: D28234424

fbshipit-source-id: c1afd2aaff953d4ecd339800d5951ae1cae4789a
2021-05-07 02:12:27 -07:00
Matthijs Douze
4dc5b27a38 Fix test failure in OSX with OpenMP called from multiple threads (#1849)
Summary:
Need to add an ssh key to the circleci to be able to debug

For my own ref, how to connect to the job:
```
[matthijs@matthijs-mbp /Users/matthijs/Desktop/faiss_github/circleci_keys] ssh -p 54782 38.39.188.110 -i id_ed25519
```

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1849

Reviewed By: wickedfoo

Differential Revision: D28234897

Pulled By: mdouze

fbshipit-source-id: 6827fa45f24b3e4bf586315bd38f18608d07ecf9
2021-05-06 05:18:34 -07:00
Matthijs Douze
061b68b43a Fix performance regression in ResultHandler (#1840)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1840

This diff is related to

https://github.com/facebookresearch/faiss/issues/1762

The ResultHandler introduced for FlatL2 and FlatIP was not multithreaded. This diff attempts to fix that. To be verified if it is indeed faster.

Reviewed By: wickedfoo

Differential Revision: D27939173

fbshipit-source-id: c85f01a97d4249fe0c6bfb04396b68a7a9fe643d
2021-04-30 00:02:27 -07:00
shengjun.li
c3842ae5ff Using DirectMapAdd to fix IVFFLAT parallel adding (#1842)
Summary:
Signed-off-by: shengjun.li <shengjun.li@zilliz.com>

When `direct_map.type` is `DirectMap::Type::Array`, IVFFLAT parallel adding may fail here.
```C++
void DirectMap::add_single_id(idx_t id, idx_t list_no, size_t offset) {
    ...
    if (type == Array) {
        assert(id == array.size());
```

However, DirectMapAdd has solved it.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1842

Reviewed By: beauby

Differential Revision: D28025735

Pulled By: mdouze

fbshipit-source-id: 74c423eacd226c9f7b1882532dcde517f6956409
2021-04-27 04:50:36 -07:00
Matthijs Douze
8bc7fb6294 Fix CMakeLists for Residual quantizer (#1846)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1846

Forgot to add the IndexResidual to the CMakeLists.txt

Reviewed By: beauby

Differential Revision: D28024663

fbshipit-source-id: 64cfb14f140b6c34d740c63543f88ae5d2980e72
2021-04-27 04:46:57 -07:00
CodemodService FBSourceClangFormatLinterBot
81c1832c29 Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D28023983

fbshipit-source-id: 338cef4bbe87e39d1cc200f3ff3d90f03af329d2
2021-04-27 03:55:24 -07:00
Matthijs Douze
bb3c52a057 IndexResidual codec
Summary:
This diff adds the following to bring the residual quantizer support on-par with PQ:
- IndexResidual can be built with index factory, serialized and used as a Faiss codec.
- ResidualCoarseQuantizer can be used as a coarse quantizer for inverted files.

The factory string looks like "RQ1x16_6x8" which means a first 16-bit quantizer then 6 8-bit ones. For IVF it's "IVF4096(RQ2x6),Flat".

Reviewed By: sc268

Differential Revision: D27865612

fbshipit-source-id: f9f11d29e9f89d3b6d4cd22e9a4f9222422d5f26
2021-04-26 20:26:43 -07:00
Patrick Weizhi Xu
1ea134eb83 Remove redundant c_api headers while installing (#1841)
Summary:
This diff is related to
https://github.com/facebookresearch/faiss/issues/1722

File structure with `-DFAISS_ENABLE_GPU=OFF`
```
/usr/local/include/faiss/c_api
├── AutoTune_c.h
├── clone_index_c.h
├── Clustering_c.h
├── error_c.h
├── error_impl.h
├── faiss_c.h
├── impl
│   └── AuxIndexStructures_c.h
├── Index_c.h
├── index_factory_c.h
├── IndexFlat_c.h
├── index_io_c.h
├── IndexIVF_c.h
├── IndexIVFFlat_c.h
├── IndexLSH_c.h
├── IndexPreTransform_c.h
├── IndexScalarQuantizer_c.h
├── IndexShards_c.h
├── macros_impl.h
├── MetaIndexes_c.h
└── VectorTransform_c.h
```

File structure with `-DFAISS_ENABLE_GPU=ON`
```
/usr/local/include/faiss/c_api
├── AutoTune_c.h
├── clone_index_c.h
├── Clustering_c.h
├── error_c.h
├── error_impl.h
├── faiss_c.h
├── gpu
│   ├── DeviceUtils_c.h
│   ├── GpuAutoTune_c.h
│   ├── GpuClonerOptions_c.h
│   ├── GpuIndex_c.h
│   ├── GpuIndicesOptions_c.h
│   ├── GpuResources_c.h
│   ├── macros_impl.h
│   └── StandardGpuResources_c.h
├── impl
│   └── AuxIndexStructures_c.h
├── Index_c.h
├── index_factory_c.h
├── IndexFlat_c.h
├── index_io_c.h
├── IndexIVF_c.h
├── IndexIVFFlat_c.h
├── IndexLSH_c.h
├── IndexPreTransform_c.h
├── IndexScalarQuantizer_c.h
├── IndexShards_c.h
├── macros_impl.h
├── MetaIndexes_c.h
└── VectorTransform_c.h
```

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1841

Reviewed By: mdouze

Differential Revision: D27992822

Pulled By: beauby

fbshipit-source-id: 63fa9a39c77502d138453cb4b04c50652e732196
2021-04-26 02:20:23 -07:00
Lucas Hosseini
31bd194e2e Fix GPU nightly builds. (#1837)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1837

Reviewed By: mdouze

Differential Revision: D27965420

Pulled By: beauby

fbshipit-source-id: 9500253ef00b2fe43c987c6069ceabcbffd26b74
2021-04-23 05:47:53 -07:00
Matthijs Douze
4f12d9c20c Fix unsigned loop indices (#1839)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1839

OpenMP 2 does not support unsigned loop indices, which raises windows contbuild errors:

https://app.circleci.com/pipelines/github/facebookresearch/faiss/1546/workflows/91eaf2b6-0347-4073-8aaa-a4edaee10158/jobs/5603

Another error on OSX is probably unrelated:

https://app.circleci.com/pipelines/github/facebookresearch/faiss/1546/workflows/91eaf2b6-0347-4073-8aaa-a4edaee10158/jobs/5606

Reviewed By: beauby

Differential Revision: D27938886

fbshipit-source-id: 16b3a86da444d0f1cdeab4652d2d8d9bdd34889b
2021-04-22 07:21:16 -07:00
Alexander Andreev
d640c6fcda Impl IndexPreTransform for c_api (#1816)
Summary:
This PR extends c_api for IndexPreTransform

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1816

Reviewed By: beauby

Differential Revision: D27904597

Pulled By: mdouze

fbshipit-source-id: b54dfffcc97879fdf66f9a8a26e9b7840a2e97f2
2021-04-22 05:27:36 -07:00
Alexander Andreev
b209361f7b Add setters for IndexIVF* indexes (#1827)
Summary:
This PR gives users control over resources

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1827

Reviewed By: beauby

Differential Revision: D27904561

Pulled By: mdouze

fbshipit-source-id: 61352f776971d9f488917e39f8746d43614386d9
2021-04-22 05:22:40 -07:00
Alexander Andreev
f15f639b64 Improve impl IndexRefineFlat for c_api (#1821)
Summary:
This PR extends c_api for IndexRefineFlat

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1821

Reviewed By: beauby

Differential Revision: D27904607

Pulled By: mdouze

fbshipit-source-id: 1a4592ab7d61bf722df1bbaf0aaee4e982a56a74
2021-04-22 05:22:40 -07:00
Alexander Andreev
d9764d8aff Add IndexIVFScalarQuantizer for c_api (#1829)
Summary:
This PR add impl IndexIVFScalarQuantizer in c_api interface

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1829

Reviewed By: beauby

Differential Revision: D27904571

Pulled By: mdouze

fbshipit-source-id: 2cdbbd356f7520cea897f69c90486837c569ed19
2021-04-22 05:19:42 -07:00
Alexander Andreev
513f895b7c Add get,set verbose for Index in c_api (#1790)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1790

Reviewed By: beauby

Differential Revision: D27904738

Pulled By: mdouze

fbshipit-source-id: 31e8881996ee558c42206b9bf7ce1f9057595133
2021-04-22 05:16:55 -07:00
Lucas Hosseini
bde7c00271 Update OSX version in CircleCI. (#1833)
Summary:
This should fix the HomeBrew failures we see.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1833

Reviewed By: mdouze

Differential Revision: D27880386

Pulled By: beauby

fbshipit-source-id: 5895dfc79a9c66c88283fd5170d2827f63bdd224
2021-04-20 05:50:52 -07:00
Y.Imaizumi
fe777d8010 Improve IndexPQFastScan and IndexIVFPQFastScan performance for aarch64 devices (#1815)
Summary:
related: https://github.com/facebookresearch/faiss/issues/1812

This PR improves the performance of `IndexPQFastScan` and `IndexIVFPQFastScan` on aarch64 devices, e.g., 60x faster on an AWS Arm instance with the SIFT1M dataset.
The contents of this PR are below:

- Add `simdlib_neon.h`
    - `simdlib_neon.h` has `simdlib` compatible API, and they are implemented with Arm NEON intrinsics.
    - `simdlib.h` includes `simdlib_neon.h` if `__aarch64__` is defined.
- Move `geteven` , `getodd` , `getlow128` , and `gethigh128` from `distances_simd.cpp` to `simdlib_avx2.h` .
- Port `geteven` , `getodd` , `getlow128` , and `gethigh128` for non-AVX2 environments.
    - These codes are implemented with AVX2 intrinsics, so they have prevented to implement `compute_PQ_dis_tables_dsub2` for non-AVX2 environments.
    - Now `simdlib_avx2.h` , `simdlib_emulated.h` , and `simdlib_neon.h` all have those functions.
- Enable `compute_PQ_dis_tables_dsub2` on aarch64
    - Above change makes `compute_PQ_dis_tables_dsub2` independent from `geteven` and so on.
    - `compute_PQ_dis_tables_dsub2` implemented with `simdlib_neon.h` is little faster than current implementation, so enabling that.
        - In contrast, `compute_PQ_dis_tables_dsub2` implemented with `simdlib_emulated.h` is slower than current implementation, so we have not enabled it in our PR.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1815

Reviewed By: beauby

Differential Revision: D27760259

Pulled By: mdouze

fbshipit-source-id: 5df6168ac35ae0174bedf04508dbaf19f11fab3f
2021-04-16 00:43:35 -07:00
Y.Imaizumi
b85b4308f2 Make simdlib_emulated.h faster (#1814)
Summary:
related: https://github.com/facebookresearch/faiss/issues/1812

This PR improves the performance of contents in `simdlib_emulated.h` .
`IndexPQFastScan` and `IndexIVFPQFastScan` will become faster on non-AVX2 environments, e.g., 4x faster on SIFT1M.
This PR contains below changes:

- Use `template` instead of `std::function` on argument of `unary_func` and `binary_func`
    - Because `std::function` hinders some optimizations like function inlining
- Use `const T&` instead of `T` for vector classes like `simd16uint16` on argument of functions
    - Vector classes on `simdlib_emulated.h` has the data member as array, so the runtime cost for copying is not so low.
    - Passing by const lvalue-ref prevents copy.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1814

Reviewed By: beauby

Differential Revision: D27760072

Pulled By: mdouze

fbshipit-source-id: cbc5a14658d1960b24ce55a395e71c80998742dc
2021-04-16 00:24:44 -07:00
Chengqi Deng
c62ab3a696 Use BLAS to compute sdc table (#1809)
Summary:
This PR used BLAS to compute sdc table in ProductQuantizer.

Here is the time of computing sdc tables:

```
nbits=8, d=128 (this commit)
M: 2, sdc: 0.0001361370086669922s
M: 4, sdc: 8.273124694824219e-05s
M: 8, sdc: 7.867813110351562e-05s
M: 16, sdc: 0.0001227855682373047s
M: 32, sdc: 0.0001697540283203125s
M: 64, sdc: 0.0007395744323730469s
```

```
nbits=8, d=128 (master)
M: 2,  sdc: 0.0055773258209228516s
M: 4,  sdc: 0.005366802215576172s
M: 8,  sdc: 0.0050809383392333984s
M: 16, sdc: 0.005639791488647461s
M: 32, sdc: 0.006036281585693359s
M: 64, sdc: 0.009720802307128906s
```

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1809

Reviewed By: beauby

Differential Revision: D27706249

Pulled By: mdouze

fbshipit-source-id: 102ae0c1c157e244e40557656934062f537b74d4
2021-04-16 00:17:51 -07:00
Alexander Andreev
1ddb517bbd Add IndexScalarQuantizer for c_api (#1802)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1802

Reviewed By: beauby

Differential Revision: D27706399

Pulled By: mdouze

fbshipit-source-id: 61ac99f61e9e44b2fca8e3de45357ee4c0a0b9d7
2021-04-15 23:04:49 -07:00
Chengqi Deng
6f6e90162b Fix typo in bench_index_flat (#1810)
Summary:
This PR fixed the typo in `bench_index_flat.py`.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1810

Reviewed By: beauby

Differential Revision: D27706115

Pulled By: mdouze

fbshipit-source-id: 35515450be8eb45d6a2e98c7372333d98fc0f7b4
2021-04-15 22:58:42 -07:00
Matthijs Douze
7559cf5c5b add ResidualQuantizer
Summary:
This diff includes:
- progressive dimension k-means.
- the ResidualQuantizer object
- GpuProgressiveDimIndexFactory so that it can be trained on GPU
- corresponding tests
- reference Python implementation of the same in scripts/matthijs/LCC_encoding

Reviewed By: wickedfoo

Differential Revision: D27608029

fbshipit-source-id: 9a8cf3310c8439a93641961ca8b042941f0f4249
2021-04-14 13:11:54 -07:00
Jeff Johnson
50534d76cc remove GPU half select implementations (#1817)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1817

There were instantiations of the k-selection templates that operated on float16 data. These are no longer needed as instead Faiss will process all data in float32 (though input data can still be in float16), so removing them to speed compilation time.

Reviewed By: beauby

Differential Revision: D27742889

fbshipit-source-id: a3cf72a10df15f335d18d1e7709ffe269024121d
2021-04-13 12:40:13 -07:00
Jeff Johnson
b544db24a8 Raw all-pairwise distance function on GPU
Summary:
This diff implements brute-force all-pairwise distances between two different sets of vectors using any of the Faiss supported metrics on the GPU (L2, IP, L1, Lp, Linf, etc).

It is implemented using the same C++ interface as `bfKnn`, except when `k == -1`, all pairwise distances will be returned (no k-selection is made). A restriction exists at present where the entire output data must be able to reside on the same GPU which may be lifted at a subsequent point.

This interface is available in python via `faiss.pairwise_distance_gpu(res, xq, xb, D, metric)` with both numpy and pytorch support which will return all of the distances in D.

Also cleaned up CUDA stream usage a little bit in Distance.cu/Distance.cuh in the C++ implementation.

Reviewed By: mdouze

Differential Revision: D27686773

fbshipit-source-id: 8de6a699cda5d7077f0ab583e9ce76e630f0f687
2021-04-13 12:06:04 -07:00
Alexander Andreev
d77169173e Add getters for c_api IndexIVFFlat (#1787)
Summary:
ref https://github.com/facebookresearch/faiss/issues/1756

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1787

Reviewed By: beauby

Differential Revision: D27619587

Pulled By: mdouze

fbshipit-source-id: 0a9bb12f27b48c1b21025957e26c7453ab64a78d
2021-04-08 03:01:53 -07:00
Chengqi Deng
213ab22b71 Parallelize add_with_id of IndexIVFFlat and IndexIVFFlatDedup (#1805)
Summary:
This PR parallelized the `add_with_ids` methods of `IndexIVFFlat` and `IndexIVFFlatDedup`. Related to https://github.com/facebookresearch/faiss/issues/1617.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1805

Reviewed By: wickedfoo

Differential Revision: D27619557

Pulled By: mdouze

fbshipit-source-id: 74e0d74c7c51870534372a7ddf6fa0badba2686c
2021-04-07 08:09:38 -07:00
Lucas Hosseini
267edb120b Increase timeout for conda packages jobs. (#1801)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1801

Reviewed By: mdouze

Differential Revision: D27536458

Pulled By: beauby

fbshipit-source-id: ca2e693a7ac98d543fe1fe2ee2031389244e3c84
2021-04-02 09:43:50 -07:00
H. Vetinari
9c58ae00f1 Portable SWIG Vectors (#1742)
Summary:
After initial positive feedback to the idea in https://github.com/facebookresearch/faiss/issues/1741 from mdouze, here are the patches
I currently have as a basis for discussion.

Matthijs suggests to not bother with the deprecation warnings at all, which is fine for me
as well, though I would normally still advocate to provide users with _some_ advance notice
before removing parts of an interface.

Fixes https://github.com/facebookresearch/faiss/issues/1741

PS. The deprecation warning is only shown once per session (per class)
PPS. I have tested in https://github.com/conda-forge/faiss-split-feedstock/pull/32 that the respective
classes remain available both through `import faiss` and `from faiss import *`.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1742

Reviewed By: mdouze

Differential Revision: D26978886

Pulled By: beauby

fbshipit-source-id: b52e2b5b5b0117af7cd95ef5df3128e9914633ad
2021-04-02 07:11:47 -07:00
Lucas Hosseini
06f1ef86ac Use larger instances for GPU builds. (#1794)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1794

Reviewed By: mdouze

Differential Revision: D27439445

Pulled By: beauby

fbshipit-source-id: 12a936766ccb49a27767ab3a36ffd37fec2e1bfc
2021-04-01 03:35:41 -07:00
Lucas Hosseini
c65f670523 Add separate targets for libfaiss/libfaiss_avx2. (#1772)
Summary:
This should fix the conda builds.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1772

Reviewed By: mdouze

Differential Revision: D27365772

Pulled By: beauby

fbshipit-source-id: 12b9d488d475842030feb1a0452acf26dbe6ac01
2021-03-26 14:28:16 -07:00
Check Deng
c37c2fa393 Support I/O and clone for NSG (#1766)
Summary:
This PR added IO and clone support to NSG.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1766

Test Plan: buck test //faiss/tests/:test_index -- TestNSG

Reviewed By: beauby

Differential Revision: D27189414

Pulled By: mdouze

fbshipit-source-id: c35c253d043c711d09a675f4ba5c3317b9423b5b
2021-03-23 09:18:15 -07:00
Check Deng
885d87f712 Support NSG in the index factory (#1758)
Summary:
## Description
This PR added NSG into the index factory. Here are the supported index strings:
1. `NSG{0}` or `NSG{0},Flat`: Create an IndexNSGFlat with `R = {0}`.
2. `IVF{0}_NSG{1},{2}`: Create an IndexIVF using NSG as a coarse quantizer where `ncentroids = {0}`, `R = {1}` and `{2}` is the second level quantizer.

These two types of indexes may be the most useful ones. Other composite indexes could be supported in the future.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1758

Test Plan: buck test //faiss/tests/:test_factory

Reviewed By: beauby

Differential Revision: D27189479

Pulled By: mdouze

fbshipit-source-id: b60000f985c490ef2e7bc561b4e209f9f61c3cc8
2021-03-23 07:28:20 -07:00