<divclass="line"><aname="l00005"></a><spanclass="lineno"> 5</span> <spanclass="comment"> * This source code is licensed under the CC-by-NC license found in the</span></div>
<divclass="line"><aname="l00006"></a><spanclass="lineno"> 6</span> <spanclass="comment"> * LICENSE file in the root directory of this source tree.</span></div>
<divclass="line"><aname="l00009"></a><spanclass="lineno"> 9</span> <spanclass="comment">// Copyright 2004-present Facebook. All Rights Reserved.</span></div>
<divclass="line"><aname="l00023"></a><spanclass="lineno"> 23</span> <spanclass="comment">// -perform bitonic merges on pairs of sorted lists, held in</span></div>
<divclass="line"><aname="l00024"></a><spanclass="lineno"> 24</span> <spanclass="comment">// registers. Each list contains N * kWarpSize (multiple of 32)</span></div>
<divclass="line"><aname="l00025"></a><spanclass="lineno"> 25</span> <spanclass="comment">// elements for some N.</span></div>
<divclass="line"><aname="l00026"></a><spanclass="lineno"> 26</span> <spanclass="comment">// The bitonic merge is implemented for arbitrary sizes;</span></div>
<divclass="line"><aname="l00027"></a><spanclass="lineno"> 27</span> <spanclass="comment">// sorted list A of size N1 * kWarpSize registers</span></div>
<divclass="line"><aname="l00028"></a><spanclass="lineno"> 28</span> <spanclass="comment">// sorted list B of size N2 * kWarpSize registers =></span></div>
<divclass="line"><aname="l00029"></a><spanclass="lineno"> 29</span> <spanclass="comment">// sorted list C if size (N1 + N2) * kWarpSize registers. N1 and N2</span></div>
<divclass="line"><aname="l00030"></a><spanclass="lineno"> 30</span> <spanclass="comment">// are >= 1 and don't have to be powers of 2.</span></div>
<divclass="line"><aname="l00032"></a><spanclass="lineno"> 32</span> <spanclass="comment">// -perform bitonic sorts on a set of N * kWarpSize key/value pairs</span></div>
<divclass="line"><aname="l00033"></a><spanclass="lineno"> 33</span> <spanclass="comment">// held in registers, by using the above bitonic merge as a</span></div>
<divclass="line"><aname="l00035"></a><spanclass="lineno"> 35</span> <spanclass="comment">// N can be an arbitrary N >= 1; i.e., the bitonic sort here supports</span></div>
<divclass="line"><aname="l00036"></a><spanclass="lineno"> 36</span> <spanclass="comment">// odd sizes and doesn't require the input to be a power of 2.</span></div>
<divclass="line"><aname="l00038"></a><spanclass="lineno"> 38</span> <spanclass="comment">// The sort or merge network is completely statically instantiated via</span></div>
<divclass="line"><aname="l00039"></a><spanclass="lineno"> 39</span> <spanclass="comment">// template specialization / expansion and constexpr, and it uses warp</span></div>
<divclass="line"><aname="l00040"></a><spanclass="lineno"> 40</span> <spanclass="comment">// shuffles to exchange values between warp lanes.</span></div>
<divclass="line"><aname="l00044"></a><spanclass="lineno"> 44</span> <spanclass="comment">// For a sorting network of keys only, we only need one</span></div>
<divclass="line"><aname="l00045"></a><spanclass="lineno"> 45</span> <spanclass="comment">// comparison (a < b). However, what we really need to know is</span></div>
<divclass="line"><aname="l00046"></a><spanclass="lineno"> 46</span> <spanclass="comment">// if one lane chooses to exchange a value, then the</span></div>
<divclass="line"><aname="l00047"></a><spanclass="lineno"> 47</span> <spanclass="comment">// corresponding lane should also do the exchange.</span></div>
<divclass="line"><aname="l00048"></a><spanclass="lineno"> 48</span> <spanclass="comment">// Thus, if one just uses the negation !(x < y) in the higher</span></div>
<divclass="line"><aname="l00049"></a><spanclass="lineno"> 49</span> <spanclass="comment">// lane, this will also include the case where (x == y). Thus, one</span></div>
<divclass="line"><aname="l00050"></a><spanclass="lineno"> 50</span> <spanclass="comment">// lane in fact performs an exchange and the other doesn't, but</span></div>
<divclass="line"><aname="l00051"></a><spanclass="lineno"> 51</span> <spanclass="comment">// because the only value being exchanged is equivalent, nothing has</span></div>
<divclass="line"><aname="l00053"></a><spanclass="lineno"> 53</span> <spanclass="comment">// So, you can get away with just one comparison and its negation.</span></div>
<divclass="line"><aname="l00055"></a><spanclass="lineno"> 55</span> <spanclass="comment">// If we're sorting keys and values, where equivalent keys can</span></div>
<divclass="line"><aname="l00056"></a><spanclass="lineno"> 56</span> <spanclass="comment">// exist, then this is a problem, since we want to treat (x, v1)</span></div>
<divclass="line"><aname="l00057"></a><spanclass="lineno"> 57</span> <spanclass="comment">// as not equivalent to (x, v2).</span></div>
<divclass="line"><aname="l00059"></a><spanclass="lineno"> 59</span> <spanclass="comment">// To remedy this, you can either compare with a lexicographic</span></div>
<divclass="line"><aname="l00061"></a><spanclass="lineno"> 61</span> <spanclass="comment">// we're predicating all of the choices results in 3 comparisons</span></div>
<divclass="line"><aname="l00062"></a><spanclass="lineno"> 62</span> <spanclass="comment">// being executed, or we can invert the selection so that there is no</span></div>
<divclass="line"><aname="l00063"></a><spanclass="lineno"> 63</span> <spanclass="comment">// middle choice of equality; the other lane will likewise</span></div>
<divclass="line"><aname="l00064"></a><spanclass="lineno"> 64</span> <spanclass="comment">// check that (b.k > a.k) (the higher lane has the values</span></div>
<divclass="line"><aname="l00065"></a><spanclass="lineno"> 65</span> <spanclass="comment">// swapped). Then, the first lane swaps if and only if the</span></div>
<divclass="line"><aname="l00066"></a><spanclass="lineno"> 66</span> <spanclass="comment">// second lane swaps; if both lanes have equivalent keys, no</span></div>
<divclass="line"><aname="l00067"></a><spanclass="lineno"> 67</span> <spanclass="comment">// swap will be performed. This results in only two comparisons</span></div>
<divclass="line"><aname="l00068"></a><spanclass="lineno"> 68</span> <spanclass="comment">// being executed.</span></div>
<divclass="line"><aname="l00070"></a><spanclass="lineno"> 70</span> <spanclass="comment">// If you don't consider values as well, then this does not produce a</span></div>
<divclass="line"><aname="l00071"></a><spanclass="lineno"> 71</span> <spanclass="comment">// consistent ordering among (k, v) pairs with equivalent keys but</span></div>
<divclass="line"><aname="l00072"></a><spanclass="lineno"> 72</span> <spanclass="comment">// different values; for us, we don't really care about ordering or</span></div>
<divclass="line"><aname="l00075"></a><spanclass="lineno"> 75</span> <spanclass="comment">// I have tried both re-arranging the order in the higher lane to get</span></div>
<divclass="line"><aname="l00076"></a><spanclass="lineno"> 76</span> <spanclass="comment">// away with one comparison or adding the value to the check; both</span></div>
<divclass="line"><aname="l00077"></a><spanclass="lineno"> 77</span> <spanclass="comment">// result in greater register consumption or lower speed than just</span></div>
<divclass="line"><aname="l00078"></a><spanclass="lineno"> 78</span> <spanclass="comment">// perfoming both < and > comparisons with the variables, so I just</span></div>
<divclass="line"><aname="l00079"></a><spanclass="lineno"> 79</span> <spanclass="comment">// stick with this.</span></div>
<divclass="line"><aname="l00081"></a><spanclass="lineno"> 81</span> <spanclass="comment">// This function merges kWarpSize / 2L lists in parallel using warp</span></div>
<divclass="line"><aname="l00083"></a><spanclass="lineno"> 83</span> <spanclass="comment">// It works on at most size-16 lists, as we need 32 threads for this</span></div>
<divclass="line"><aname="l00086"></a><spanclass="lineno"> 86</span> <spanclass="comment">// If IsBitonic is false, the first stage is reversed, so we don't</span></div>
<divclass="line"><aname="l00087"></a><spanclass="lineno"> 87</span> <spanclass="comment">// need to sort directionally. It's still technically a bitonic sort.</span></div>
<divclass="line"><aname="l00091"></a><spanclass="lineno"> 91</span>  static_assert(utils::isPowerOf2(L), <spanclass="stringliteral">"L must be a power-of-2"</span>);</div>
<divclass="line"><aname="l00092"></a><spanclass="lineno"> 92</span>  static_assert(L <= kWarpSize / 2, <spanclass="stringliteral">"merge list size must be <= 16"</span>);</div>
<divclass="line"><aname="l00097"></a><spanclass="lineno"> 97</span> <spanclass="comment">// Reverse the first comparison stage.</span></div>
<divclass="line"><aname="l00098"></a><spanclass="lineno"> 98</span> <spanclass="comment">// For example, merging a list of size 8 has the exchanges:</span></div>
<divclass="line"><aname="l00103"></a><spanclass="lineno"> 103</span> <spanclass="comment">// Whether we are the lesser thread in the exchange</span></div>
<divclass="line"><aname="l00104"></a><spanclass="lineno"> 104</span> <spanclass="keywordtype">bool</span> small = !(laneId & L);</div>
<divclass="line"><aname="l00107"></a><spanclass="lineno"> 107</span> <spanclass="comment">// See the comment above how performing both of these</span></div>
<divclass="line"><aname="l00108"></a><spanclass="lineno"> 108</span> <spanclass="comment">// comparisons in the warp seems to win out over the</span></div>
<divclass="line"><aname="l00109"></a><spanclass="lineno"> 109</span> <spanclass="comment">// alternatives in practice</span></div>
<divclass="line"><aname="l00110"></a><spanclass="lineno"> 110</span> <spanclass="keywordtype">bool</span> s = small ? Comp::gt(k, otherK) : Comp::lt(k, otherK);</div>
<divclass="line"><aname="l00126"></a><spanclass="lineno"> 126</span> <spanclass="comment">// Whether we are the lesser thread in the exchange</span></div>
<divclass="line"><aname="l00127"></a><spanclass="lineno"> 127</span> <spanclass="keywordtype">bool</span> small = !(laneId & stride);</div>
<divclass="line"><aname="l00142"></a><spanclass="lineno"> 142</span> <spanclass="comment">// Template for performing a bitonic merge of an arbitrary set of</span></div>
<divclass="line"><aname="l00163"></a><spanclass="lineno"><aclass="line"href="structfaiss_1_1gpu_1_1BitonicMergeStep_3_01K_00_01V_00_01N_00_01Dir_00_01Comp_00_01Low_00_01true_01_4.html"> 163</a></span> <spanclass="keyword">struct </span><aclass="code"href="structfaiss_1_1gpu_1_1BitonicMergeStep.html">BitonicMergeStep</a><K, V, N, Dir, Comp, Low, true> {</div>
<divclass="line"><aname="l00164"></a><spanclass="lineno"> 164</span> <spanclass="keyword">static</span><spanclass="keyword">inline</span> __device__ <spanclass="keywordtype">void</span> merge(K k[N], V v[N]) {</div>
<divclass="line"><aname="l00165"></a><spanclass="lineno"> 165</span>  static_assert(utils::isPowerOf2(N), <spanclass="stringliteral">"must be power of 2"</span>);</div>
<divclass="line"><aname="l00166"></a><spanclass="lineno"> 166</span>  static_assert(N > 1, <spanclass="stringliteral">"must be N > 1"</span>);</div>
<divclass="line"><aname="l00169"></a><spanclass="lineno"> 169</span> <spanclass="preprocessor"></span><spanclass="keywordflow">for</span> (<spanclass="keywordtype">int</span> i = 0; i < N / 2; ++i) {</div>
<divclass="line"><aname="l00170"></a><spanclass="lineno"> 170</span>  K& ka = k[i];</div>
<divclass="line"><aname="l00171"></a><spanclass="lineno"> 171</span>  V& va = v[i];</div>
<divclass="line"><aname="l00186"></a><spanclass="lineno"> 186</span> <spanclass="preprocessor"></span><spanclass="keywordflow">for</span> (<spanclass="keywordtype">int</span> i = 0; i < N / 2; ++i) {</div>
<divclass="line"><aname="l00194"></a><spanclass="lineno"> 194</span> <spanclass="preprocessor"></span><spanclass="keywordflow">for</span> (<spanclass="keywordtype">int</span> i = 0; i < N / 2; ++i) {</div>
<divclass="line"><aname="l00205"></a><spanclass="lineno"> 205</span> <spanclass="preprocessor"></span><spanclass="keywordflow">for</span> (<spanclass="keywordtype">int</span> i = 0; i < N / 2; ++i) {</div>
<divclass="line"><aname="l00206"></a><spanclass="lineno"> 206</span>  newK[i] = k[i + N / 2];</div>
<divclass="line"><aname="l00207"></a><spanclass="lineno"> 207</span>  newV[i] = v[i + N / 2];</div>
<divclass="line"><aname="l00213"></a><spanclass="lineno"> 213</span> <spanclass="preprocessor"></span><spanclass="keywordflow">for</span> (<spanclass="keywordtype">int</span> i = 0; i < N / 2; ++i) {</div>
<divclass="line"><aname="l00214"></a><spanclass="lineno"> 214</span>  k[i + N / 2] = newK[i];</div>
<divclass="line"><aname="l00215"></a><spanclass="lineno"> 215</span>  v[i + N / 2] = newV[i];</div>
<divclass="line"><aname="l00227"></a><spanclass="lineno"><aclass="line"href="structfaiss_1_1gpu_1_1BitonicMergeStep_3_01K_00_01V_00_01N_00_01Dir_00_01Comp_00_01true_00_01false_01_4.html"> 227</a></span> <spanclass="keyword">struct </span><aclass="code"href="structfaiss_1_1gpu_1_1BitonicMergeStep.html">BitonicMergeStep</a><K, V, N, Dir, Comp, true, false> {</div>
<divclass="line"><aname="l00228"></a><spanclass="lineno"> 228</span> <spanclass="keyword">static</span><spanclass="keyword">inline</span> __device__ <spanclass="keywordtype">void</span> merge(K k[N], V v[N]) {</div>
<divclass="line"><aname="l00229"></a><spanclass="lineno"> 229</span>  static_assert(!utils::isPowerOf2(N), <spanclass="stringliteral">"must be non-power-of-2"</span>);</div>
<divclass="line"><aname="l00230"></a><spanclass="lineno"> 230</span>  static_assert(N >= 3, <spanclass="stringliteral">"must be N >= 3"</span>);</div>
<divclass="line"><aname="l00235"></a><spanclass="lineno"> 235</span> <spanclass="preprocessor"></span><spanclass="keywordflow">for</span> (<spanclass="keywordtype">int</span> i = 0; i < N - kNextHighestPowerOf2 / 2; ++i) {</div>
<divclass="line"><aname="l00236"></a><spanclass="lineno"> 236</span>  K& ka = k[i];</div>
<divclass="line"><aname="l00237"></a><spanclass="lineno"> 237</span>  V& va = v[i];</div>
<divclass="line"><aname="l00261"></a><spanclass="lineno"> 261</span> <spanclass="comment">// FIXME: compiler doesn't like this expression? compiler bug?</span></div>
<divclass="line"><aname="l00286"></a><spanclass="lineno"> 286</span> <spanclass="comment">// FIXME: compiler doesn't like this expression? compiler bug?</span></div>
<divclass="line"><aname="l00303"></a><spanclass="lineno"><aclass="line"href="structfaiss_1_1gpu_1_1BitonicMergeStep_3_01K_00_01V_00_01N_00_01Dir_00_01Comp_00_01false_00_01false_01_4.html"> 303</a></span> <spanclass="keyword">struct </span><aclass="code"href="structfaiss_1_1gpu_1_1BitonicMergeStep.html">BitonicMergeStep</a><K, V, N, Dir, Comp, false, false> {</div>
<divclass="line"><aname="l00304"></a><spanclass="lineno"> 304</span> <spanclass="keyword">static</span><spanclass="keyword">inline</span> __device__ <spanclass="keywordtype">void</span> merge(K k[N], V v[N]) {</div>
<divclass="line"><aname="l00305"></a><spanclass="lineno"> 305</span>  static_assert(!utils::isPowerOf2(N), <spanclass="stringliteral">"must be non-power-of-2"</span>);</div>
<divclass="line"><aname="l00306"></a><spanclass="lineno"> 306</span>  static_assert(N >= 3, <spanclass="stringliteral">"must be N >= 3"</span>);</div>
<divclass="line"><aname="l00311"></a><spanclass="lineno"> 311</span> <spanclass="preprocessor"></span><spanclass="keywordflow">for</span> (<spanclass="keywordtype">int</span> i = 0; i < N - kNextHighestPowerOf2 / 2; ++i) {</div>
<divclass="line"><aname="l00312"></a><spanclass="lineno"> 312</span>  K& ka = k[i];</div>
<divclass="line"><aname="l00313"></a><spanclass="lineno"> 313</span>  V& va = v[i];</div>
<divclass="line"><aname="l00337"></a><spanclass="lineno"> 337</span> <spanclass="comment">// FIXME: compiler doesn't like this expression? compiler bug?</span></div>
<divclass="line"><aname="l00362"></a><spanclass="lineno"> 362</span> <spanclass="comment">// FIXME: compiler doesn't like this expression? compiler bug?</span></div>
<divclass="line"><aname="l00377"></a><spanclass="lineno"> 377</span> <spanclass="comment">/// Merges two sets of registers across the warp of any size;</span></div>
<divclass="line"><aname="l00378"></a><spanclass="lineno"> 378</span> <spanclass="comment">/// i.e., merges a sorted k/v list of size kWarpSize * N1 with a</span></div>
<divclass="line"><aname="l00379"></a><spanclass="lineno"> 379</span> <spanclass="comment">/// sorted k/v list of size kWarpSize * N2, where N1 and N2 are any</span></div>
<divclass="line"><aname="l00380"></a><spanclass="lineno"> 380</span> <spanclass="comment">/// value >= 1</span></div>
<divclass="line"><aname="l00412"></a><spanclass="lineno"> 412</span> <spanclass="comment">// ka is always first in the list, so we needn't use our lane</span></div>
<divclass="line"><aname="l00413"></a><spanclass="lineno"> 413</span> <spanclass="comment">// in this comparison</span></div>
<divclass="line"><aname="l00418"></a><spanclass="lineno"> 418</span> <spanclass="comment">// kb is always second in the list, so we needn't use our lane</span></div>
<divclass="line"><aname="l00419"></a><spanclass="lineno"> 419</span> <spanclass="comment">// in this comparison</span></div>
<divclass="line"><aname="l00426"></a><spanclass="lineno"> 426</span> <spanclass="comment">// We don't care about updating elements in the second list</span></div>
<divclass="line"><aname="l00433"></a><spanclass="lineno"> 433</span> <spanclass="comment">// Only if we care about N2 do we need to bother merging it fully</span></div>
<divclass="line"><aname="l00439"></a><spanclass="lineno"> 439</span> <spanclass="comment">// Recursive template that uses the above bitonic merge to perform a</span></div>
<divclass="line"><aname="l00443"></a><spanclass="lineno"> 443</span> <spanclass="keyword">static</span><spanclass="keyword">inline</span> __device__ <spanclass="keywordtype">void</span> sort(K k[N], V v[N]) {</div>
<divclass="line"><aname="l00444"></a><spanclass="lineno"> 444</span>  static_assert(N > 1, <spanclass="stringliteral">"did not hit specialized case"</span>);</div>
<divclass="line"><aname="l00492"></a><spanclass="lineno"> 492</span> <spanclass="keyword">static</span><spanclass="keyword">inline</span> __device__ <spanclass="keywordtype">void</span> sort(K k[1], V v[1]) {</div>
<divclass="line"><aname="l00493"></a><spanclass="lineno"> 493</span> <spanclass="comment">// Update this code if this changes</span></div>
<divclass="line"><aname="l00494"></a><spanclass="lineno"> 494</span> <spanclass="comment">// should go from 1 -> kWarpSize in multiples of 2</span></div>
<divclass="line"><aname="l00505"></a><spanclass="lineno"> 505</span> <spanclass="comment">/// Sort a list of kWarpSize * N elements in registers, where N is an</span></div>