faiss/tests/test_mem_leak.cpp
Matthijs Douze b813ba805e Reduce mem usage + improve performance for sequential search imlementation
Summary:
Following up on issue https://github.com/facebookresearch/faiss/issues/2054 it seems that this code crashes Faiss (instead of just leaking memory).

Findings:

- when running in MT mode, each search in an indexflat used as coarse quantizer consumes some memory
- this mem consumption does not appear in single-thread mode or with few threads
- in gdb it appears that even when the nb of queries is 1, each search spawns max_threads threads (80 on the test machine)

This diff:

- adds a C++ test that checks how much mem is used when repeatedly searching a vector
- adjusts the number of search threads to the number of query vectors. This is especially useful for single-vector queries.

Reviewed By: beauby

Differential Revision: D31142383

fbshipit-source-id: 134ddaf141e7c52a854cea398f5dbf89951a7ff8
2021-10-05 15:54:04 -07:00

67 lines
2.0 KiB
C++

/**
* Copyright (c) Facebook, Inc. and its affiliates.
*
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*/
#include <faiss/IndexFlat.h>
#include <faiss/IndexIVFFlat.h>
#include <faiss/utils/random.h>
#include <faiss/utils/utils.h>
#include <gtest/gtest.h>
using namespace faiss;
TEST(MEM_LEAK, ivfflat) {
size_t num_tfidf_faiss_cells = 20;
size_t max_tfidf_features = 500;
IndexFlatIP quantizer(max_tfidf_features);
IndexIVFFlat tfidf_faiss_index(
&quantizer, max_tfidf_features, num_tfidf_faiss_cells);
std::vector<float> dense_matrix(5000 * max_tfidf_features);
float_rand(dense_matrix.data(), dense_matrix.size(), 123);
tfidf_faiss_index.train(5000, dense_matrix.data());
tfidf_faiss_index.add(5000, dense_matrix.data());
int N1 = 1000;
int N2 = 10000;
std::vector<float> ent_substr_tfidfs_list(N1 * max_tfidf_features);
float_rand(
ent_substr_tfidfs_list.data(), ent_substr_tfidfs_list.size(), 1234);
for (int bs : {1, 4, 16}) {
size_t m0 = get_mem_usage_kb();
double t0 = getmillisecs();
for (int i = 0; i < N2; i++) {
std::vector<Index::idx_t> I(10 * bs);
std::vector<float> D(10 * bs);
tfidf_faiss_index.search(
bs,
ent_substr_tfidfs_list.data() +
(i % (N1 - bs + 1)) * max_tfidf_features,
10,
D.data(),
I.data());
if (i % 100 == 0) {
printf("[%.2f s] BS %d %d: %ld kB %.2f bytes/it\r",
(getmillisecs() - t0) / 1000,
bs,
i,
get_mem_usage_kb(),
(get_mem_usage_kb() - m0) * 1024.0 / (i + 1));
fflush(stdout);
}
}
printf("\n");
EXPECT_GE(50 * bs, (get_mem_usage_kb() - m0) * 1024.0 / N2);
}
}