Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4056
Part 2 of more HNSW unit tests
Added comments indicating some currently unused code.
Reviewed By: junjieqi
Differential Revision: D66782376
fbshipit-source-id: 5a7c210d516424e58054f36f0a74249859e48a8f
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4054
Part 1 of more HNSW unit tests
Reviewed By: junjieqi
Differential Revision: D66690398
fbshipit-source-id: 29b35f1c0a919c168fd4cb770552f41eaed5b6b6
Summary:
distances_simd.cpp had insufficient test coverage. While there may be some codepaths untested due to compiler flags, this should cover most of it.
Also added a couple comments and fixed a typo
Reviewed By: pankajsingh88
Differential Revision: D66792601
fbshipit-source-id: 24301ffd383d21703f7579096c6aa9b41ece1509
Summary:
X-link: https://github.com/pytorch/torchrec/pull/2603
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4045
Need to add to `__init__.py` like Matthijs mentioned on the github issue https://github.com/facebookresearch/faiss/issues/3993. But we can't do it for non-GPU code, otherwise it will throw an exception and fail many tests than include fbcode/faiss. So we need to check if FAISS GPU is importable first.
To find the class names like GpuIndexIVFFlat etc, I checked everything under faiss/gpu where the constructor accepts an Index. The other Index is always parameter at index 1 (0-indexed), so that's why we use 1 in the function calls.
Reviewed By: pankajsingh88
Differential Revision: D66675910
fbshipit-source-id: f170dadb6318c620420689164f9522f9815aa980
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4052
`std::bad_alloc` exception is not consistent across platforms and libraries. Getting rid of it as it's causing test failures.
Reviewed By: junjieqi
Differential Revision: D66713400
fbshipit-source-id: 9d0784eff40b8fa2aaddb64659f272b80005a821
Summary:
Pin dependecy version to get stable CI signal
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4046
Reviewed By: mnorris11
Differential Revision: D66684832
Pulled By: junjieqi
fbshipit-source-id: 9749d328688034514d6f2315d313fbf8045405ee
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4018
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4014
This diff adds support for bfloat16 vector/query data types with the GPU brute-force k-nearest neighbor function (`bfKnn`).
The change is largely just plumbing the new data type through the template hierarchy (so distances can be computed in bfloat16).
Of note, by design, all final distance results are produced in float32 regardless of input data type (float32, float16, bfloat16). This is because the true nearest neighbors in many data sets can often differ by only ~1000 float32 ULPs in terms of distance which will result in possible false equivalency. This seems to be one area where lossy compression/quantization thoughout does not work as well (and is also why `CUBLAS_MATH_DISALLOW_REDUCED_PRECISION_REDUCTION` is set in `StandardGpuResources.cpp`. However, given that there is native bf16 x bf16 = fp32 tensor core support on Ampere+ architectures, the matrix multiplication itself should use them.
As bfloat16 support is quite lacking on AMD/ROCm (see [here](https://rocm.docs.amd.com/projects/HIPIFY/en/latest/tables/CUDA_Device_API_supported_by_HIP.html), very few bf16 functions implemented), bf16 functionality is completely disabled / not compiled for AMD ROCm.
Reviewed By: mdouze
Differential Revision: D65459723
fbshipit-source-id: 8a6aec843f7e37c205d95f2485442a26c402a3b0
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4034
Preload datasets in manifold so that subsequent stages of training, indexing and search can use those instead of each trainer or indexer downloading data.
Reviewed By: kuarora
Differential Revision: D65926898
fbshipit-source-id: 9341d2676fd2a50027887e821ec95768e829af31
Summary:
- Fix a typo in comment
- Add missing header files to `FAISS_HEADERS` in CMake config
There should be some check against the inconsistency between `FAISS_HEADERS` and actual files, e.g. test compiling with installed headers and shared library (otherwise it always succeeds in the source dir).
This is not the first time that headers are missing (https://github.com/facebookresearch/faiss/issues/3218).
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4010
Reviewed By: junjieqi
Differential Revision: D65630647
Pulled By: kuarora
fbshipit-source-id: 2efcfc4bbd0b2d29efa817e1ff9371942c15d30a
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4029
Remove headers flagged by facebook-unused-include-check over fbcode.faiss.
+ format and autodeps
This is a codemod. It was automatically generated and will be landed once it is approved and tests are passing in sandcastle.
You have been added as a reviewer by Sentinel or Butterfly.
Autodiff project: uif
Autodiff partition: fbcode.faiss
Autodiff bookmark: ad.uif.fbcode.faiss
Reviewed By: dtolnay
Differential Revision: D65957849
fbshipit-source-id: f6199250db595defd56f5e7b2828f838702e9a16
Summary:
Remove the dependency on `raft::compiled` and modify GPU implementations to use cuVS backend in place of RAFT.
A deeper insight into the dependency:
FAISS gets the ANN algorithm implementations such as IVF-Flat and IVF-PQ from cuVS. RAFT is meant to be a lightweight C++ header-only template library that cuVS relies on for the more fundamental / low-level utilities. Some examples of these are RAFT's device mdarray and mdspan objects; the RAFT resource object (`raft::resource`) that takes care of the stream ordering of device functions; linear algebra functions such as mapping, reduction, BLAS routines etc. A lot of the cuVS functions take the RAFT mdspan objects as arguments (for example `raft::device_matrix_view`). Therefore FAISS relies on both cuVS and RAFT. FAISS gets RAFT headers through cuVS and uses them to create the function arguments that can be consumed by cuVS. Note that we are not explicitly linking FAISS against `raft::raft` or `raft::compiled`. Only the required headers are included and compiled rather than compiling the whole RAFT shared library. This is the reason we still see mentions of `raft` in FAISS.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3549
Reviewed By: ramilbakhshyiev
Differential Revision: D62041013
Pulled By: asadoughi
fbshipit-source-id: 7230dcc06cf47baf95873adc1dec2adca4a8f82a
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4024
LLVM-15 has a warning `-Wunused-variable` which we treat as an error because it's so often diagnostic of a code issue. Unused variables can compromise readability or, worse, performance.
This diff either (a) removes an unused variable and, possibly, it's associated code or (b) qualifies the variable with `[[maybe_unused]]`.
#buildsonlynotests - Builds are sufficient
- If you approve of this diff, please use the "Accept & Ship" button :-)
Differential Revision: D65755277
fbshipit-source-id: 13a2ad06375fd84e5e7afd69488e7fa36b658f20
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4011
Our upcoming compiler upgrade will require us not to have shadowed variables. Such variables have a _high_ bug rate and reduce readability, so we would like to avoid them even if the compiler was not forcing us to do so.
This codemod attempts to fix an instance of a shadowed variable. Please review with care: if it's failed the result will be a silent bug.
**What's a shadowed variable?**
Shadowed variables are variables in an inner scope with the same name as another variable in an outer scope. Having the same name for both variables might be semantically correct, but it can make the code confusing to read! It can also hide subtle bugs.
This diff fixes such an issue by renaming the variable.
- If you approve of this diff, please use the "Accept & Ship" button :-)
Reviewed By: asadoughi
Differential Revision: D65347906
fbshipit-source-id: 4dbf5d2ee0300379e897954fe367cdc3186b5521
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4012
This should enable us to read units of data where the unit is a split instead of reading X number of rows.
Reviewed By: kuarora, asadoughi
Differential Revision: D65429573
fbshipit-source-id: 27d901fe83840c3b2bd3cca66fbad3721b12a9ec
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4017
Exposing an option to run kmeans centroids and assignments to hive table which should bring us close in parity with Digraph's Kmeans API. This is needed for cluster balance data quality checks for large scale centroids
Reviewed By: kuarora
Differential Revision: D64835789
fbshipit-source-id: 95cbea00bb6b4733c03836049bc379be813bf9e5
Summary:
Demonstrate IndexLSH does not need training or codebook serialization
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4009
Reviewed By: junjieqi
Differential Revision: D65274645
Pulled By: asadoughi
fbshipit-source-id: c9af463757edbd07cc07b1cf607b88373fa334c4
Summary:
This adds read_VectorTransform to the C API. This is helpful for independently loading a vector transform rather than as an IndexPreTransform.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3970
Reviewed By: asadoughi
Differential Revision: D65192203
Pulled By: junjieqi
fbshipit-source-id: 949ae875924b9f3558d7a9f43c4f2aa8ae705f02
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4001
It was mentioned in S461104 chat to cover all index types. This adds Binary for telemetry as well as the reverse factory string for binary indexes, which we did not support before.
Unit test covers 4 ways of reading a binary index.
The reverse factory string util is not complete. The remaining binary index types could get added later.
Reviewed By: asadoughi
Differential Revision: D65102643
fbshipit-source-id: 52f1053bda59e427a081369ada80265b67e55bd4
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4003
It was emitting "SQSQ8" for `IndexScalarQuantizer`. The enum names already have `SQ` in the prefix.
Reviewed By: kuarora
Differential Revision: D65083146
fbshipit-source-id: 564f6d3d8987a2188fb7aa82b5ba772034225550
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3989
Moved add_sa_codes, sa_code_size to Index, IndexBinary base classes from IndexIVF to support adding coded vectors with ids using IDMap2,PQ
For an alternative approach, see previous attempt with merge_ids and merge_codes: D64941798
Reviewed By: mnorris11
Differential Revision: D64972587
fbshipit-source-id: 71622fc35a378d9892569a56442a872f0c9c9e83
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3999
Fixed the bug causing `merge_flat_ondisk` stress run failures.
Running multiple `merge_flat_ondisk` tests simultaneously fails which is causing buck stress-run failures.
https://www.internalfb.com/intern/test/562950025349567/
Root cause: we were updating input copy (which was discarded) of the filename template instead of the local copy.
Reviewed By: asadoughi
Differential Revision: D65074463
fbshipit-source-id: 9f86deeb56975a3be7a15a8f56d602463cad61af
Summary:
These were not added in the refactoring, so adding them now.
Added unit tests which should cover things
The `orig->own_fields = false` because we wrap the inner index too, so it gets deleted.
Reviewed By: bshethmeta, pankajsingh88
Differential Revision: D64702824
fbshipit-source-id: b2e36374dd0066581c57a55b0ea16323cdc519f0
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3987
Created a new notebook demonstrating how to separate serializing and deserializing the PQ codebook (via faiss.write_index for IndexPQ) independently of the vector codes. For example, in the case where you have a few vector embeddings per user and want to shard the flat index by user you can re-use the same PQ method for all users but store each user's codes independently.
Reviewed By: junjieqi
Differential Revision: D64844978
fbshipit-source-id: ad6434101fbb3ef84999527a577ecb9b503e556c
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3981
prior issues:
1. Nightlies were breaking due to yaml changes. Those are reverted.
2. AIX build broke (external). Fix is to add a conditional in tests/CMakeLists.txt.
3. Build issue https://github.com/facebookresearch/faiss/issues/3944. Could not repro now.
Reviewed By: mdouze
Differential Revision: D64440629
fbshipit-source-id: a86b27c25ada0d07e9d3b4c6e4f00b2e6b637fbe
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3977
These were getting erased in D64484165.
Similar to D64481766, we need to add lint ignore to not erase Nvidia license.
Without the changes in this diff to ignore lint, screenshots show these 4 files erase the license when running the lint command:
Before lint:
{F1941712401}
After lint:
{F1941712573}
Reviewed By: asadoughi
Differential Revision: D64712875
fbshipit-source-id: ada63a8d2f3e4af6c58971f83053b0eb443908d8
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3967
This diff just finds and replaces duplicate license headers.
See the errors for "duplicate-license-header" in D64429711 under "linter-coverage-verification" signal.
Reviewed By: asadoughi
Differential Revision: D64484123
fbshipit-source-id: 906e8baa3a11a3bbee174a03dcc27681f9fd78c2
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3975
Graph-based indices are often quite bulky in terms of storage becausee the out-degree of the edges is high (32 or more) and edges are just encoded as 32-bit ints.
This diff makes it possible to replace the graph structure of NSG with a compressed version, but only for search (the full graph needs to be present at addition time).
It is easier to do it for NSG than for HNSW for several reasons:
- NSG's graph is not hierarichal -- HNSW is and the edges for the different levels are interleaved
- the NSG graph object is already isolated well (thanks KingLittleQ!)
- NSG cannot be built incrementally so it is easier to convert the graph after all adds are done in one go.
The custom compressed graph is currently only implemented as a test, but could be integrated in the main Faiss as an option to NSG.
Reviewed By: algoriddle
Differential Revision: D64646137
fbshipit-source-id: c10c2a485b44561d32941ce1e7a0e3fe512cf0ac
Summary:
- Called the hipify script at CMAKE configure time removing the need for the user to run it.
- Now removes any .hip files left over when running the hipify script.
- Cleaned up the hipify script to remove redundancy.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3962
Reviewed By: asadoughi, ramilbakhshyiev
Differential Revision: D64495550
Pulled By: mnorris11
fbshipit-source-id: 5547712a4e46fc18cf62346adb0395d0e5626399