Summary:
This fixes the build, except for MacOS, where there's a problem with cmake + OpenMP. We can fix it separately.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2856
Reviewed By: mlomeli1
Differential Revision: D45704458
Pulled By: algoriddle
fbshipit-source-id: 0c09036ae5fa34ab114b857f407a35603986613a
Summary:
RAFT requires cmake 3.23.1, pulling it from conda-forge. We continue to keep the dependency on conda-forge minimal, hence the ordering of the channels and the pinning of sysroot to a specific version.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2808
Test Plan:
contbuild
Imported from GitHub, without a `Test Plan:` line.
Reviewed By: mlomeli1
Differential Revision: D44746827
Pulled By: algoriddle
fbshipit-source-id: ad576b11b257203bd0cafd57c2c2e7fd8d10ca98
Summary:
1. GPU builds use CircleCI base image, no docker
2. Switched to CUDA 11.4 (used to be 11.3)
3. Merged all build jobs into two parameterized targets: `build_cmake` and `build_conda`.
4. Cleaned up test execution, fixed bug of Python GPU tests not running on PRs
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2803
Reviewed By: mlomeli1
Differential Revision: D44541714
Pulled By: algoriddle
fbshipit-source-id: aa09ae638ecb6ef556d42f27a4bfaddad7355e50
Summary:
# Refactor
- Extract executors and merge all conda builds (that run on diffs) and conda package builds (that run nightly and on release) into two parameterized targets (`build_conda`, `deploy_conda`), except for GPU builds (at least for now)
- Similarly, introduce a `build_cmake` target that can be parameterized with executor, although run it for Linux x86_64 only (for now)
- Keep GPU targets separate (both conda package build and cmake) for now. Introduce "v2" targets that will eventually replace the current GPU build targets (we need to resolve GPU test failures).
- Removed `beauby/faiss-circleci:cpu` docker container, use the miniconda docker for Linux and machine images everywhere else. v2 GPU targets use the latest circleci images (see https://discuss.circleci.com/t/cuda-11-8-gpu-cuda-image-any-plans/47240/3)
# New/changed functionality
- Dropped CUDA 10
- Support for Linux arm64 conda packages
- Workflows have a consistent naming scheme, `OSX arm64 (conda)`, `Linux x86_64 (cmake)` etc.
- No cmake build for Linux or OSX arm64, replaced both with a conda build target only. We can reintroduce arm64 cmake workflows for both if needed (via additional parameterized build_cmake workflows), but it seemed unnecessary to me.
# Next steps
- Make v2 GPU builds work, deprecate v1, get rid of all docker stuff
- Merge GPU builds into cmake/conda build targets
- Possibly further unify package build and conda build targets
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2798
Test Plan: contbuild
Reviewed By: mlomeli1
Differential Revision: D44469783
Pulled By: algoriddle
fbshipit-source-id: 8489942fb7a4e4de1dd2d4466790e550191d15a1
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2791
Removed building for 3.7 (D44373285 - not supported for M1), adding 3.10 with this diff.
The only change that matters is in `conda_build_config.yaml`, the others are about making the configs consistent between CPU and GPU.
Reviewed By: mlomeli1
Differential Revision: D44405573
fbshipit-source-id: ad933e08834593e55a35075c602e5f509a813e73
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2594
Bumping `mkl` to 2021 and installing `mkl-devel` in the build environment to fix the Windows nightly build.
Reviewed By: mlomeli1
Differential Revision: D41534391
fbshipit-source-id: 7c681f530a1efe649cd176135a23ebb0fb44d70f
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2552
The `conda inspect` commands in the `test` section fail without `conda-build` in the `test` environment.
Reviewed By: mlomeli1
Differential Revision: D40793051
fbshipit-source-id: 184418cfa8d0efd6af6b0c806f7bddbeba176732
Summary:
Fixes OSX CI by pinning pytorch version for interop tests. The "real" fix is already landed in pytorch but has not been released yet.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2482
Reviewed By: alexanderguzhva
Differential Revision: D39891113
Pulled By: beauby
fbshipit-source-id: fa79bf9de1c93e056260ea64613e37625edfecc3
Summary:
CentOS 8 being EOL, some modifications to the OS packages repositories
are needed in order to keep the build working.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2495
Reviewed By: alexanderguzhva
Differential Revision: D39857306
Pulled By: beauby
fbshipit-source-id: 175f436132c589a9a22d93727bb58e556a43a8f6
Summary:
- Disable problematic tests on OSX.
- Ensure compiler compatibility with CUDA builds.
- Fix path for Python extension libraries.
- Use CentOS for CUDA packaging.
- Update CUDA versions in CI (10.2 and 11.3).
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/2121
Reviewed By: mdouze
Differential Revision: D32921117
Pulled By: beauby
fbshipit-source-id: 588c18add8084b8228ff5abc651eaa4567919cc6
Summary:
This should fix the GPU nighties.
The rationale for the cp is that there is a shared file between the CPU and GPU tests.
Ideally, this file should probably moved to contrib at some point.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1901
Reviewed By: beauby
Differential Revision: D28680898
Pulled By: mdouze
fbshipit-source-id: b9d0e1969103764ecb6f1e047c9ed4bd4a76aaba
Summary:
IndexPQ and IndexIVFPQ implementations with AVX shuffle instructions.
The training and computing of the codes does not change wrt. the original PQ versions but the code layout is "packed" so that it can be used efficiently by the SIMD computation kernels.
The main changes are:
- new IndexPQFastScan and IndexIVFPQFastScan objects
- simdib.h for an abstraction above the AVX2 intrinsics
- BlockInvertedLists for invlists that are 32-byte aligned and where codes are not sequential
- pq4_fast_scan.h/.cpp: for packing codes and look-up tables + optmized distance comptuation kernels
- simd_result_hander.h: SIMD version of result collection in heaps / reservoirs
Misc changes:
- added contrib.inspect_tools to access fields in C++ objects
- moved .h and .cpp code for inverted lists to an invlists/ subdirectory, and made a .h/.cpp for InvertedListsIOHook
- added a new inverted lists type with 32-byte aligned codes (for consumption by SIMD)
- moved Windows-specific intrinsics to platfrom_macros.h
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1542
Test Plan:
```
buck test mode/opt -j 4 //faiss/tests/:test_fast_scan_ivf //faiss/tests/:test_fast_scan
buck test mode/opt //faiss/manifold/...
```
Reviewed By: wickedfoo
Differential Revision: D25175439
Pulled By: mdouze
fbshipit-source-id: ad1a40c0df8c10f4b364bdec7172e43d71b56c34
Summary: Currently, conda version strings are built from the latest git tag, which starts with the letter `v`. This confuses conda, which orders v1.6.5 before 1.6.3.
Reviewed By: LowikC
Differential Revision: D25151276
fbshipit-source-id: 7abfb547fee3468b26fedb6637a15e725755daf3
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1484
This diff allows for native usage of PyTorch tensors for Faiss indexes on both CPU and GPU. It is currently only implemented in this diff for things that inherit from `faiss.Index`, which covers the non-binary indices, and it patches the same functions on `faiss.Index` that were also covered by `__init__.py` for numpy interoperability.
There must be uniformity among the inputs: if any array input is a Torch tensor, then all array inputs must be Torch tensors. Similarly, if any array input is a numpy ndarray, then all array inputs must be numpy ndarrays.
If `faiss.contrib.torch_utils` is imported, it ensures that `import faiss` has already been performed to patch all of the functions using the base `__init__.py` numpy wrappers, and then patches the following functions again:
```
add
add_with_ids
assign
train
search
remove_ids
reconstruct
reconstruct_n
range_search
update_vectors
search_and_reconstruct
sa_encode
sa_decode
```
to allow usage of PyTorch CPU tensors, and additionally PyTorch GPU tensors if the index being used is on the GPU.
numpy functionality is still available when `faiss.contrib.torch_utils` is imported; we pass through to the original patched numpy function when we detect numpy inputs.
In addition, to allow for better (asynchronous) GPU usage without requiring the CPU to be involved, all of these functions which construct tensors/arrays for output now take optional arguments for storage (numpy or torch.Tensor) to be provided that will contain the output data. `range_search` is the only exception to this, as the size of the output data is indeterminate. The eventual GPU implementation will likely require the user to provide a maximum cap on the output size, and allow that to be passed instead. If the optional pre-allocated output values are presented by the user, they are used; otherwise, new return ndarray / Tensors are constructed as before and used for the return. If this feature were not provided on the GPU, then every execution would be completely serial as we would depend upon the CPU to allocate GPU memory before every operation. Instead, now this can function much like NN graph execution on the GPU, assuming that all of the data requirements are pre-allocated, so the execution will run at the full speed of the GPU and not be stalled sequentially launching kernels.
This diff also exposes the `GpuResources` shared_ptr object owned by a GPU index. This is required for pytorch GPU so that we can perform proper stream ordering in Faiss with respect to the current pytorch stream. So, Faiss indices now perform more or less as any NN operation in Torch does.
Note, however, that a Faiss index has its own setting on current device, and if the pytorch GPU tensor inputs are resident on a different device than what the Faiss index expects, a cross-device copy will be initiated. I may choose to make this an error in the future and require matching device to device.
This diff also found a bug when passing GPU data directly to `train()` for `GpuIndexIVFFlat` and `GpuIndexIVFScalarQuantizer`, as I guess we never tested passing GPU data directly to these functions before. `GpuIndexIVFPQ` was doing the right thing however.
The assign function is now also implemented on the GPU as well, and is now marked `const` to be in line with the `search` function.
Also added better checking of non-contiguous inputs for both Torch tensors and numpy ndarrays.
Updated the `knn_gpu` function with a base implementation always present that allows for usage of numpy arrays, which is overridden when `torch_utils` is imported to allow torch usage. This supports row/column major layout, float32/float16 data and int64/int32 indices for both numpy and torch.
Reviewed By: mdouze
Differential Revision: D24299400
fbshipit-source-id: b4f117b9c120bd1ad83e7702087051ab7b303b29
Summary:
This PR paves the way for nightly builds.
+ Get rid of cmake 3.17 manual install as cmake 3.18 is now available
in conda.
+ Update docker files for conda packages.
+ Specify CUDA architectures via CMake's `CMAKE_CUDA_ARCHITECTURES`.
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1422
Reviewed By: mdouze
Differential Revision: D23870447
Pulled By: beauby
fbshipit-source-id: 40ae7517e83356443a007a43261713e7e3a140d4
Changelog:
- changed license: BSD+Patents -> MIT
- propagates exceptions raised in sub-indexes of IndexShards and IndexReplicas
- support for searching several inverted lists in parallel (parallel_mode != 0)
- better support for PQ codes where nbit != 8 or 16
- IVFSpectralHash implementation: spectral hash codes inside an IVF
- 6-bit per component scalar quantizer (4 and 8 bit were already supported)
- combinations of inverted lists: HStackInvertedLists and VStackInvertedLists
- configurable number of threads for OnDiskInvertedLists prefetching (including 0=no prefetch)
- more test and demo code compatible with Python 3 (print with parentheses)
- refactored benchmark code: data loading is now in a single file
+ Add conda packages metadata (now building Faiss using conda's toolchain);
+ add Dockerfile for building conda packages (for all CUDA versions);
+ add working Dockerfile building faiss on Centos7;
+ simplify GPU build;
+ avoid falling back to CPU-only version (python);
+ simplify TravisCI config;
+ update INSTALL.md;
+ add configure flag for specifying target architectures (--with-cuda-arch);
+ fix Makefile for gpu tests;
+ fix various Makefile issues;
+ remove stale file (gpu/utils/DeviceUtils.cpp).