5 Commits

Author SHA1 Message Date
Jeff Johnson
8d776e6453 PyTorch tensor / Faiss index interoperability (#1484)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1484

This diff allows for native usage of PyTorch tensors for Faiss indexes on both CPU and GPU. It is currently only implemented in this diff for things that inherit from `faiss.Index`, which covers the non-binary indices, and it patches the same functions on `faiss.Index` that were also covered by `__init__.py` for numpy interoperability.

There must be uniformity among the inputs: if any array input is a Torch tensor, then all array inputs must be Torch tensors. Similarly, if any array input is a numpy ndarray, then all array inputs must be numpy ndarrays.

If `faiss.contrib.torch_utils` is imported, it ensures that `import faiss` has already been performed to patch all of the functions using the base `__init__.py` numpy wrappers, and then patches the following functions again:

```
add
add_with_ids
assign
train
search
remove_ids
reconstruct
reconstruct_n
range_search
update_vectors
search_and_reconstruct
sa_encode
sa_decode
```

to allow usage of PyTorch CPU tensors, and additionally PyTorch GPU tensors if the index being used is on the GPU.

numpy functionality is still available when `faiss.contrib.torch_utils` is imported; we pass through to the original patched numpy function when we detect numpy inputs.

In addition, to allow for better (asynchronous) GPU usage without requiring the CPU to be involved, all of these functions which construct tensors/arrays for output now take optional arguments for storage (numpy or torch.Tensor) to be provided that will contain the output data. `range_search` is the only exception to this, as the size of the output data is indeterminate. The eventual GPU implementation will likely require the user to provide a maximum cap on the output size, and allow that to be passed instead. If the optional pre-allocated output values are presented by the user, they are used; otherwise, new return ndarray / Tensors are constructed as before and used for the return. If this feature were not provided on the GPU, then every execution would be completely serial as we would depend upon the CPU to allocate GPU memory before every operation. Instead, now this can function much like NN graph execution on the GPU, assuming that all of the data requirements are pre-allocated, so the execution will run at the full speed of the GPU and not be stalled sequentially launching kernels.

This diff also exposes the `GpuResources` shared_ptr object owned by a GPU index. This is required for pytorch GPU so that we can perform proper stream ordering in Faiss with respect to the current pytorch stream. So, Faiss indices now perform more or less as any NN operation in Torch does.

Note, however, that a Faiss index has its own setting on current device, and if the pytorch GPU tensor inputs are resident on a different device than what the Faiss index expects, a cross-device copy will be initiated. I may choose to make this an error in the future and require matching device to device.

This diff also found a bug when passing GPU data directly to `train()` for `GpuIndexIVFFlat` and `GpuIndexIVFScalarQuantizer`, as I guess we never tested passing GPU data directly to these functions before. `GpuIndexIVFPQ` was doing the right thing however.

The assign function is now also implemented on the GPU as well, and is now marked `const` to be in line with the `search` function.

Also added better checking of non-contiguous inputs for both Torch tensors and numpy ndarrays.

Updated the `knn_gpu` function with a base implementation always present that allows for usage of numpy arrays, which is overridden when `torch_utils` is imported to allow torch usage. This supports row/column major layout, float32/float16 data and int64/int32 indices for both numpy and torch.

Reviewed By: mdouze

Differential Revision: D24299400

fbshipit-source-id: b4f117b9c120bd1ad83e7702087051ab7b303b29
2020-10-23 22:24:22 -07:00
Lucas Hosseini
70eaa9b1a3 Add missing copyright headers. (#1460)
Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1460

Reviewed By: wickedfoo

Differential Revision: D24278804

Pulled By: beauby

fbshipit-source-id: 5ea96ceb63be76a34f1eb4da03972159342cd5b6
2020-10-13 11:15:59 -07:00
Matthijs Douze
8b05434a50 Remove useless function
Summary:
Removed an unused function that caused compile errors in some configurations.
Added contrib function (exhaustive_search.knn) to compute the k nearest neighbors without constructing an index.
Renamed the equivalent GPU function as exhaustive_search.knn_gpu (it does not make much sense to mention numpy in the name as all functions take numpy arguments by default).

Reviewed By: beauby

Differential Revision: D24215427

fbshipit-source-id: 6d8e1eafa7c57593304b7b76f83b3015e4d2a2bb
2020-10-09 07:57:04 -07:00
Jeff Johnson
0412d761e5 GPU brute-force kNN can take int32 indices (#1445)
Summary:
Pull Request resolved: https://github.com/facebookresearch/faiss/pull/1445

As requested in https://github.com/facebookresearch/faiss/issues/1304, `bfKnn` can now produce int32 indices for output.

The native kernels themselves for brute-force kNN only operate on int32 indices in any case, so this is faster.

Also added a SWIG definition for float16 numpy arrays. As there is not a native half type, the reverse definition is undefined, so this is only really used for taking float16 data (e.g., from numpy) as input when in Python.

Added a `knn_numpy_gpu` wrapper as well that handles calling the `bfKnn` GPU implementation using CPU numpy arrays. This handles transposition and f32/f16/i32 data types as needed.

Reviewed By: mdouze

Differential Revision: D24152296

fbshipit-source-id: caa7daea23438cf26aa248e380f0dab2b6b907fd
2020-10-08 17:50:42 -07:00
Lucas Hosseini
cd38e82f0c
Facebook sync 2020-07-31 (#1308) 2020-08-03 22:15:02 +02:00