12 Commits

Author SHA1 Message Date
Alexandr Guzhva
0b74765cca Speedup exhaustive_L2sqr_blas for AVX2, ARM NEON and AVX512 (#2568)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2568

Add a fused kernel for exhaustive_L2sqr_blas() call that combines a computation of dot product and the search for the nearest centroid. As a result, no temporary dot product values are written and read in RAM.

Speeds up the training of PQx[1] indices for dsub = 1, 2, 4, 8, and the effect is higher for higher values of [1]. AVX512 version provides additional overloads for dsub = 12, 16.

The speedup is also beneficial for higher values of pq.cp.max_points_per_centroid (which is 256 by default).

Speeds up IVFPQ training as well.

AVX512 kernel is not enabled, but I've seen it speeding up the training TWICE versus AVX2 version. So, please feel free to use it by enabling AVX512 manually.

Reviewed By: mdouze

Differential Revision: D41166766

fbshipit-source-id: 443014e2e59396b3a90b9171fec8c8191052bcf4
2022-11-14 17:01:52 -08:00
Gergely Szilvasy
a85aea9aee bumping gtest to 1.12.1 (#2538)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2538

Due to https://github.com/google/googletest/issues/3219 gtest doesn't compile with gcc 11 - Ubuntu 22.04 has GCC 11.2 currently.

The fix was https://github.com/google/googletest/pull/3024 so I'm bumping gtest to latest.

Reviewed By: alexanderguzhva

Differential Revision: D40512746

fbshipit-source-id: 75f3c3c7f8a117af8430c2f74a7f8d164ca9877b
2022-10-20 03:23:20 -07:00
Alexandr Guzhva
39eab0c0ac Add a missing test file reference to CMakeLists.txt (#2474)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2474

Reviewed By: mdouze

Differential Revision: D39545093

fbshipit-source-id: 93852750649936340a62743b5e7744f5c559e608
2022-09-20 07:14:35 -07:00
wx257osn2@yahoo.co.jp
2952487f0e Fix test_cppcontrib_sa_decode.cpp (#2388)
Summary:
This PR contains below changes:

- Conform C++11
    - [`faiss` is written in C++11](https://github.com/facebookresearch/faiss/blob/main/CONTRIBUTING.md#coding-style), but [`faiss/cppcontrib/SaDecodeKernels-avx2-inl.h`](442d9f4a2d/faiss/cppcontrib/SaDecodeKernels-avx2-inl.h) and [the test](442d9f4a2d/tests/test_cppcontrib_sa_decode.cpp) use some C++17 features. This PR rewrites these codes to make them independent to C++17.
- Enable AVX2 on `faiss_test`
    - Currently `faiss_test` is compiled without `-mavx2` even if `-DFAISS_OPT_LEVEL=avx2` , so **`tests/test_cppcontrib_sa_decode.cpp` hasn't checked `faiss/cppcontrib/SaDecodeKernels-avx2-inl.h` at all** . This PR adds `-mavx2` to `faiss_test` if `-DFAISS_OPT_LEVEL=avx2` , so now `tests/test_cppcontrib_sa_decode.cpp` confirms `faiss/cppcontrib/SaDecodeKernels-avx2-inl.h` if `-DFAISS_OPT_LEVEL=avx2` , and does `faiss/cppcontrib/SaDecodeKernels-inl.h` if not `-DFAISS_OPT_LEVEL=avx2` .

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2388

Reviewed By: mdouze

Differential Revision: D38005738

Pulled By: alexanderguzhva

fbshipit-source-id: b9319c585c6849e1c7a4782770f2d7ce8c0d8660
2022-07-20 23:46:31 -07:00
Alexandr Guzhva
3986ebffca fast C++ templates for sa_decode (#2354)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2354

A specialized code that provides 2x-3x faster Index::sa_decode for
* IVF256,PQ[1]x8np
* Residual[1]x8,PQ[2]x8

Reviewed By: mdouze

Differential Revision: D37092134

fbshipit-source-id: d848b6cf1aefa826a5ca01e41935aa5d46f5dcc7
2022-06-16 09:20:19 -07:00
Matthijs Douze
b813ba805e Reduce mem usage + improve performance for sequential search imlementation
Summary:
Following up on issue https://github.com/facebookresearch/faiss/issues/2054 it seems that this code crashes Faiss (instead of just leaking memory).

Findings:

- when running in MT mode, each search in an indexflat used as coarse quantizer consumes some memory
- this mem consumption does not appear in single-thread mode or with few threads
- in gdb it appears that even when the nb of queries is 1, each search spawns max_threads threads (80 on the test machine)

This diff:

- adds a C++ test that checks how much mem is used when repeatedly searching a vector
- adjusts the number of search threads to the number of query vectors. This is especially useful for single-vector queries.

Reviewed By: beauby

Differential Revision: D31142383

fbshipit-source-id: 134ddaf141e7c52a854cea398f5dbf89951a7ff8
2021-10-05 15:54:04 -07:00
Lucas Hosseini
c65f670523 Add separate targets for libfaiss/libfaiss_avx2. (#1772)
Summary:
This should fix the conda builds.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1772

Reviewed By: mdouze

Differential Revision: D27365772

Pulled By: beauby

fbshipit-source-id: 12b9d488d475842030feb1a0452acf26dbe6ac01
2021-03-26 14:28:16 -07:00
Lucas Hosseini
b7b261cad1 Move from TravisCI to CircleCI (#1315)
Summary:
Depends on https://github.com/facebookresearch/faiss/issues/1313.

Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1315

Reviewed By: mdouze

Differential Revision: D23148557

Pulled By: beauby

fbshipit-source-id: 0a35f17d22aa04db6bd1c16cfc5ff8eee28f1f74
2020-08-15 04:00:51 -07:00
Lucas Hosseini
a8e4c5e2d5 Move build to CMake (#1313)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1313

Reviewed By: mdouze

Differential Revision: D22948267

Pulled By: beauby

fbshipit-source-id: ec16fa0342f37672d46fb7886ecc55c7996011c4
2020-08-14 15:03:10 -07:00
Lucas Hosseini
ac7005b6ef Remove CMake. (#645) 2018-12-23 18:45:16 +01:00
Adeykin
c056f1d320 mkl support in cmake (#123) 2017-05-31 15:41:42 +02:00
Tianwei Shen
80314d9f07 add initial cmake support (#75)
* add initial cmake support

* update cmake, add cmake instructions to INSTALL

* update findopenmp and INSTALL

* change FindOpenBLAS.cmake to cater for macports

- change cblas.h to openblas_config.h since macports does not ship
cblas.h with openblas.

* revise INSTALL for cmake
2017-05-02 11:04:50 +02:00