<divclass="line"><aname="l00005"></a><spanclass="lineno"> 5</span> <spanclass="comment"> * This source code is licensed under the BSD+Patents license found in the</span></div>
<divclass="line"><aname="l00006"></a><spanclass="lineno"> 6</span> <spanclass="comment"> * LICENSE file in the root directory of this source tree.</span></div>
<divclass="line"><aname="l00009"></a><spanclass="lineno"> 9</span> <spanclass="comment">// Copyright 2004-present Facebook. All Rights Reserved</span></div>
<divclass="line"><aname="l00039"></a><spanclass="lineno"> 39</span> <spanclass="comment">// size of the database we plan to index</span></div>
<divclass="line"><aname="l00040"></a><spanclass="lineno"> 40</span> <spanclass="keywordtype">size_t</span> nb = 1000 * 1000;</div>
<divclass="line"><aname="l00041"></a><spanclass="lineno"> 41</span> <spanclass="keywordtype">size_t</span> add_bs = 10000; <spanclass="comment">// # size of the blocks to add</span></div>
<divclass="line"><aname="l00043"></a><spanclass="lineno"> 43</span> <spanclass="comment">// make a set of nt training vectors in the unit cube</span></div>
<divclass="line"><aname="l00044"></a><spanclass="lineno"> 44</span> <spanclass="comment">// (could be the database)</span></div>
<divclass="line"><aname="l00045"></a><spanclass="lineno"> 45</span> <spanclass="keywordtype">size_t</span> nt = 100 * 1000;</div>
<divclass="line"><aname="l00048"></a><spanclass="lineno"> 48</span> <spanclass="comment">// Define the core quantizer</span></div>
<divclass="line"><aname="l00049"></a><spanclass="lineno"> 49</span> <spanclass="comment">// We choose a multiple inverted index for faster training with less data</span></div>
<divclass="line"><aname="l00050"></a><spanclass="lineno"> 50</span> <spanclass="comment">// and because it usually offers best accuracy/speed trade-offs</span></div>
<divclass="line"><aname="l00052"></a><spanclass="lineno"> 52</span> <spanclass="comment">// We here assume that its lifespan of this coarse quantizer will cover the</span></div>
<divclass="line"><aname="l00053"></a><spanclass="lineno"> 53</span> <spanclass="comment">// lifespan of the inverted-file quantizer IndexIVFFlat below</span></div>
<divclass="line"><aname="l00054"></a><spanclass="lineno"> 54</span> <spanclass="comment">// With dynamic allocation, one may give the responsability to free the</span></div>
<divclass="line"><aname="l00055"></a><spanclass="lineno"> 55</span> <spanclass="comment">// quantizer to the inverted-file index (with attribute do_delete_quantizer)</span></div>
<divclass="line"><aname="l00057"></a><spanclass="lineno"> 57</span> <spanclass="comment">// Note: a regular clustering algorithm would be defined as:</span></div>
<divclass="line"><aname="l00060"></a><spanclass="lineno"> 60</span> <spanclass="comment">// Use nhash=2 subquantizers used to define the product coarse quantizer</span></div>
<divclass="line"><aname="l00061"></a><spanclass="lineno"> 61</span> <spanclass="comment">// Number of bits: we will have 2^nbits_coarse centroids per subquantizer</span></div>
<divclass="line"><aname="l00062"></a><spanclass="lineno"> 62</span> <spanclass="comment">// meaning (2^12)^nhash distinct inverted lists</span></div>
<divclass="line"><aname="l00064"></a><spanclass="lineno"> 64</span> <spanclass="comment">// The parameter bytes_per_code is determined by the memory</span></div>
<divclass="line"><aname="l00065"></a><spanclass="lineno"> 65</span> <spanclass="comment">// constraint, the dataset will use nb * (bytes_per_code + 8)</span></div>
<divclass="line"><aname="l00068"></a><spanclass="lineno"> 68</span> <spanclass="comment">// The parameter nbits_subq is determined by the size of the dataset to index.</span></div>
<divclass="line"><aname="l00080"></a><spanclass="lineno"> 80</span> <spanclass="comment">// the coarse quantizer should not be dealloced before the index</span></div>
<divclass="line"><aname="l00081"></a><spanclass="lineno"> 81</span> <spanclass="comment">// 4 = nb of bytes per code (d must be a multiple of this)</span></div>
<divclass="line"><aname="l00082"></a><spanclass="lineno"> 82</span> <spanclass="comment">// 8 = nb of bits per sub-code (almost always 8)</span></div>
<divclass="line"><aname="l00083"></a><spanclass="lineno"> 83</span> <aclass="code"href="namespacefaiss.html#afd12191c638da74760ff397cf319752c">faiss::MetricType</a> metric = faiss::METRIC_L2; <spanclass="comment">// can be METRIC_INNER_PRODUCT</span></div>
<divclass="line"><aname="l00084"></a><spanclass="lineno"> 84</span> <aclass="code"href="structfaiss_1_1IndexIVFPQ.html">faiss::IndexIVFPQ</a> index (&coarse_quantizer, d, ncentroids, bytes_per_code, 8);</div>
<divclass="line"><aname="l00087"></a><spanclass="lineno"> 87</span> <spanclass="comment">// define the number of probes. 2048 is for high-dim, overkill in practice</span></div>
<divclass="line"><aname="l00088"></a><spanclass="lineno"> 88</span> <spanclass="comment">// Use 4-1024 depending on the trade-off speed accuracy that you want</span></div>
<divclass="line"><aname="l00094"></a><spanclass="lineno"> 94</span> <spanclass="comment">// The distribution of the training vectors should be the same</span></div>
<divclass="line"><aname="l00095"></a><spanclass="lineno"> 95</span> <spanclass="comment">// as the database vectors. It could be a sub-sample of the</span></div>
<divclass="line"><aname="l00096"></a><spanclass="lineno"> 96</span> <spanclass="comment">// database vectors, if sampling is not biased. Here we just</span></div>
<divclass="line"><aname="l00097"></a><spanclass="lineno"> 97</span> <spanclass="comment">// randomly generate the vectors.</span></div>
<divclass="line"><aname="l00121"></a><spanclass="lineno"> 121</span>  { <spanclass="comment">// populating the database</span></div>
<divclass="line"><aname="l00122"></a><spanclass="lineno"> 122</span>  printf (<spanclass="stringliteral">"[%.3f s] Building a dataset of %ld vectors to index\n"</span>,</div>
<divclass="line"><aname="l00134"></a><spanclass="lineno"> 134</span>  printf (<spanclass="stringliteral">"[%.3f s] Adding the vectors to the index\n"</span>, elapsed() - t0);</div>
<divclass="line"><aname="l00136"></a><spanclass="lineno"> 136</span> <spanclass="keywordflow">for</span> (<spanclass="keywordtype">size_t</span> begin = 0; begin < nb; begin += add_bs) {</div>
<divclass="line"><aname="l00137"></a><spanclass="lineno"> 137</span> <spanclass="keywordtype">size_t</span> end = std::min (begin + add_bs, nb);</div>
<divclass="line"><aname="l00143"></a><spanclass="lineno"> 143</span> <spanclass="comment">// remember a few elements from the database as queries</span></div>
<divclass="line"><aname="l00156"></a><spanclass="lineno"> 156</span> <spanclass="comment">// A few notes on the internal format of the index:</span></div>
<divclass="line"><aname="l00158"></a><spanclass="lineno"> 158</span> <spanclass="comment">// - the positing lists for PQ codes are index.codes, which is a</span></div>
<divclass="line"><aname="l00160"></a><spanclass="lineno"> 160</span> <spanclass="comment">// if n is the length of posting list #i, codes[i] has length bytes_per_code * n</span></div>
<divclass="line"><aname="l00162"></a><spanclass="lineno"> 162</span> <spanclass="comment">// - the corresponding ids are stored in index.ids</span></div>
<divclass="line"><aname="l00164"></a><spanclass="lineno"> 164</span> <spanclass="comment">// - given a vector float *x, finding which k centroids are</span></div>
<divclass="line"><aname="l00165"></a><spanclass="lineno"> 165</span> <spanclass="comment">// closest to it (ie to find the nearest neighbors) can be done with</span></div>
<divclass="line"><aname="l00177"></a><spanclass="lineno"> 177</span> <spanclass="stringliteral">"of %ld vectors in the index\n"</span>,</div>
<divclass="ttc"id="namespacefaiss_html_afd12191c638da74760ff397cf319752c"><divclass="ttname"><ahref="namespacefaiss.html#afd12191c638da74760ff397cf319752c">faiss::MetricType</a></div><divclass="ttdeci">MetricType</div><divclass="ttdoc">Some algorithms support both an inner product vetsion and a L2 search version. </div><divclass="ttdef"><b>Definition:</b><ahref="Index_8h_source.html#l00043">Index.h:43</a></div></div>