<divclass="line"><aname="l00005"></a><spanclass="lineno"> 5</span> <spanclass="comment"> * This source code is licensed under the BSD+Patents license found in the</span></div>
<divclass="line"><aname="l00006"></a><spanclass="lineno"> 6</span> <spanclass="comment"> * LICENSE file in the root directory of this source tree.</span></div>
<divclass="line"><aname="l00022"></a><spanclass="lineno"> 22</span> <spanclass="comment">// -perform bitonic merges on pairs of sorted lists, held in</span></div>
<divclass="line"><aname="l00023"></a><spanclass="lineno"> 23</span> <spanclass="comment">// registers. Each list contains N * kWarpSize (multiple of 32)</span></div>
<divclass="line"><aname="l00024"></a><spanclass="lineno"> 24</span> <spanclass="comment">// elements for some N.</span></div>
<divclass="line"><aname="l00025"></a><spanclass="lineno"> 25</span> <spanclass="comment">// The bitonic merge is implemented for arbitrary sizes;</span></div>
<divclass="line"><aname="l00026"></a><spanclass="lineno"> 26</span> <spanclass="comment">// sorted list A of size N1 * kWarpSize registers</span></div>
<divclass="line"><aname="l00027"></a><spanclass="lineno"> 27</span> <spanclass="comment">// sorted list B of size N2 * kWarpSize registers =></span></div>
<divclass="line"><aname="l00028"></a><spanclass="lineno"> 28</span> <spanclass="comment">// sorted list C if size (N1 + N2) * kWarpSize registers. N1 and N2</span></div>
<divclass="line"><aname="l00029"></a><spanclass="lineno"> 29</span> <spanclass="comment">// are >= 1 and don't have to be powers of 2.</span></div>
<divclass="line"><aname="l00031"></a><spanclass="lineno"> 31</span> <spanclass="comment">// -perform bitonic sorts on a set of N * kWarpSize key/value pairs</span></div>
<divclass="line"><aname="l00032"></a><spanclass="lineno"> 32</span> <spanclass="comment">// held in registers, by using the above bitonic merge as a</span></div>
<divclass="line"><aname="l00034"></a><spanclass="lineno"> 34</span> <spanclass="comment">// N can be an arbitrary N >= 1; i.e., the bitonic sort here supports</span></div>
<divclass="line"><aname="l00035"></a><spanclass="lineno"> 35</span> <spanclass="comment">// odd sizes and doesn't require the input to be a power of 2.</span></div>
<divclass="line"><aname="l00037"></a><spanclass="lineno"> 37</span> <spanclass="comment">// The sort or merge network is completely statically instantiated via</span></div>
<divclass="line"><aname="l00038"></a><spanclass="lineno"> 38</span> <spanclass="comment">// template specialization / expansion and constexpr, and it uses warp</span></div>
<divclass="line"><aname="l00039"></a><spanclass="lineno"> 39</span> <spanclass="comment">// shuffles to exchange values between warp lanes.</span></div>
<divclass="line"><aname="l00043"></a><spanclass="lineno"> 43</span> <spanclass="comment">// For a sorting network of keys only, we only need one</span></div>
<divclass="line"><aname="l00044"></a><spanclass="lineno"> 44</span> <spanclass="comment">// comparison (a < b). However, what we really need to know is</span></div>
<divclass="line"><aname="l00045"></a><spanclass="lineno"> 45</span> <spanclass="comment">// if one lane chooses to exchange a value, then the</span></div>
<divclass="line"><aname="l00046"></a><spanclass="lineno"> 46</span> <spanclass="comment">// corresponding lane should also do the exchange.</span></div>
<divclass="line"><aname="l00047"></a><spanclass="lineno"> 47</span> <spanclass="comment">// Thus, if one just uses the negation !(x < y) in the higher</span></div>
<divclass="line"><aname="l00048"></a><spanclass="lineno"> 48</span> <spanclass="comment">// lane, this will also include the case where (x == y). Thus, one</span></div>
<divclass="line"><aname="l00049"></a><spanclass="lineno"> 49</span> <spanclass="comment">// lane in fact performs an exchange and the other doesn't, but</span></div>
<divclass="line"><aname="l00050"></a><spanclass="lineno"> 50</span> <spanclass="comment">// because the only value being exchanged is equivalent, nothing has</span></div>
<divclass="line"><aname="l00052"></a><spanclass="lineno"> 52</span> <spanclass="comment">// So, you can get away with just one comparison and its negation.</span></div>
<divclass="line"><aname="l00054"></a><spanclass="lineno"> 54</span> <spanclass="comment">// If we're sorting keys and values, where equivalent keys can</span></div>
<divclass="line"><aname="l00055"></a><spanclass="lineno"> 55</span> <spanclass="comment">// exist, then this is a problem, since we want to treat (x, v1)</span></div>
<divclass="line"><aname="l00056"></a><spanclass="lineno"> 56</span> <spanclass="comment">// as not equivalent to (x, v2).</span></div>
<divclass="line"><aname="l00058"></a><spanclass="lineno"> 58</span> <spanclass="comment">// To remedy this, you can either compare with a lexicographic</span></div>
<divclass="line"><aname="l00060"></a><spanclass="lineno"> 60</span> <spanclass="comment">// we're predicating all of the choices results in 3 comparisons</span></div>
<divclass="line"><aname="l00061"></a><spanclass="lineno"> 61</span> <spanclass="comment">// being executed, or we can invert the selection so that there is no</span></div>
<divclass="line"><aname="l00062"></a><spanclass="lineno"> 62</span> <spanclass="comment">// middle choice of equality; the other lane will likewise</span></div>
<divclass="line"><aname="l00063"></a><spanclass="lineno"> 63</span> <spanclass="comment">// check that (b.k > a.k) (the higher lane has the values</span></div>
<divclass="line"><aname="l00064"></a><spanclass="lineno"> 64</span> <spanclass="comment">// swapped). Then, the first lane swaps if and only if the</span></div>
<divclass="line"><aname="l00065"></a><spanclass="lineno"> 65</span> <spanclass="comment">// second lane swaps; if both lanes have equivalent keys, no</span></div>
<divclass="line"><aname="l00066"></a><spanclass="lineno"> 66</span> <spanclass="comment">// swap will be performed. This results in only two comparisons</span></div>
<divclass="line"><aname="l00067"></a><spanclass="lineno"> 67</span> <spanclass="comment">// being executed.</span></div>
<divclass="line"><aname="l00069"></a><spanclass="lineno"> 69</span> <spanclass="comment">// If you don't consider values as well, then this does not produce a</span></div>
<divclass="line"><aname="l00070"></a><spanclass="lineno"> 70</span> <spanclass="comment">// consistent ordering among (k, v) pairs with equivalent keys but</span></div>
<divclass="line"><aname="l00071"></a><spanclass="lineno"> 71</span> <spanclass="comment">// different values; for us, we don't really care about ordering or</span></div>
<divclass="line"><aname="l00074"></a><spanclass="lineno"> 74</span> <spanclass="comment">// I have tried both re-arranging the order in the higher lane to get</span></div>
<divclass="line"><aname="l00075"></a><spanclass="lineno"> 75</span> <spanclass="comment">// away with one comparison or adding the value to the check; both</span></div>
<divclass="line"><aname="l00076"></a><spanclass="lineno"> 76</span> <spanclass="comment">// result in greater register consumption or lower speed than just</span></div>
<divclass="line"><aname="l00077"></a><spanclass="lineno"> 77</span> <spanclass="comment">// perfoming both < and > comparisons with the variables, so I just</span></div>
<divclass="line"><aname="l00078"></a><spanclass="lineno"> 78</span> <spanclass="comment">// stick with this.</span></div>
<divclass="line"><aname="l00080"></a><spanclass="lineno"> 80</span> <spanclass="comment">// This function merges kWarpSize / 2L lists in parallel using warp</span></div>
<divclass="line"><aname="l00082"></a><spanclass="lineno"> 82</span> <spanclass="comment">// It works on at most size-16 lists, as we need 32 threads for this</span></div>
<divclass="line"><aname="l00085"></a><spanclass="lineno"> 85</span> <spanclass="comment">// If IsBitonic is false, the first stage is reversed, so we don't</span></div>
<divclass="line"><aname="l00086"></a><spanclass="lineno"> 86</span> <spanclass="comment">// need to sort directionally. It's still technically a bitonic sort.</span></div>
<divclass="line"><aname="l00090"></a><spanclass="lineno"> 90</span>  static_assert(utils::isPowerOf2(L), <spanclass="stringliteral">"L must be a power-of-2"</span>);</div>
<divclass="line"><aname="l00091"></a><spanclass="lineno"> 91</span>  static_assert(L <= kWarpSize / 2, <spanclass="stringliteral">"merge list size must be <= 16"</span>);</div>
<divclass="line"><aname="l00096"></a><spanclass="lineno"> 96</span> <spanclass="comment">// Reverse the first comparison stage.</span></div>
<divclass="line"><aname="l00097"></a><spanclass="lineno"> 97</span> <spanclass="comment">// For example, merging a list of size 8 has the exchanges:</span></div>
<divclass="line"><aname="l00102"></a><spanclass="lineno"> 102</span> <spanclass="comment">// Whether we are the lesser thread in the exchange</span></div>
<divclass="line"><aname="l00103"></a><spanclass="lineno"> 103</span> <spanclass="keywordtype">bool</span> small = !(laneId & L);</div>
<divclass="line"><aname="l00106"></a><spanclass="lineno"> 106</span> <spanclass="comment">// See the comment above how performing both of these</span></div>
<divclass="line"><aname="l00107"></a><spanclass="lineno"> 107</span> <spanclass="comment">// comparisons in the warp seems to win out over the</span></div>
<divclass="line"><aname="l00108"></a><spanclass="lineno"> 108</span> <spanclass="comment">// alternatives in practice</span></div>
<divclass="line"><aname="l00109"></a><spanclass="lineno"> 109</span> <spanclass="keywordtype">bool</span> s = small ? Comp::gt(k, otherK) : Comp::lt(k, otherK);</div>
<divclass="line"><aname="l00125"></a><spanclass="lineno"> 125</span> <spanclass="comment">// Whether we are the lesser thread in the exchange</span></div>
<divclass="line"><aname="l00126"></a><spanclass="lineno"> 126</span> <spanclass="keywordtype">bool</span> small = !(laneId & stride);</div>
<divclass="line"><aname="l00141"></a><spanclass="lineno"> 141</span> <spanclass="comment">// Template for performing a bitonic merge of an arbitrary set of</span></div>
<divclass="line"><aname="l00162"></a><spanclass="lineno"><aclass="line"href="structfaiss_1_1gpu_1_1BitonicMergeStep_3_01K_00_01V_00_01N_00_01Dir_00_01Comp_00_01Low_00_01true_01_4.html"> 162</a></span> <spanclass="keyword">struct </span><aclass="code"href="structfaiss_1_1gpu_1_1BitonicMergeStep.html">BitonicMergeStep</a><K, V, N, Dir, Comp, Low, true> {</div>
<divclass="line"><aname="l00163"></a><spanclass="lineno"> 163</span> <spanclass="keyword">static</span><spanclass="keyword">inline</span> __device__ <spanclass="keywordtype">void</span> merge(K k[N], V v[N]) {</div>
<divclass="line"><aname="l00164"></a><spanclass="lineno"> 164</span>  static_assert(utils::isPowerOf2(N), <spanclass="stringliteral">"must be power of 2"</span>);</div>
<divclass="line"><aname="l00165"></a><spanclass="lineno"> 165</span>  static_assert(N > 1, <spanclass="stringliteral">"must be N > 1"</span>);</div>
<divclass="line"><aname="l00168"></a><spanclass="lineno"> 168</span> <spanclass="preprocessor"></span><spanclass="keywordflow">for</span> (<spanclass="keywordtype">int</span> i = 0; i < N / 2; ++i) {</div>
<divclass="line"><aname="l00169"></a><spanclass="lineno"> 169</span>  K& ka = k[i];</div>
<divclass="line"><aname="l00170"></a><spanclass="lineno"> 170</span>  V& va = v[i];</div>
<divclass="line"><aname="l00185"></a><spanclass="lineno"> 185</span> <spanclass="preprocessor"></span><spanclass="keywordflow">for</span> (<spanclass="keywordtype">int</span> i = 0; i < N / 2; ++i) {</div>
<divclass="line"><aname="l00193"></a><spanclass="lineno"> 193</span> <spanclass="preprocessor"></span><spanclass="keywordflow">for</span> (<spanclass="keywordtype">int</span> i = 0; i < N / 2; ++i) {</div>
<divclass="line"><aname="l00204"></a><spanclass="lineno"> 204</span> <spanclass="preprocessor"></span><spanclass="keywordflow">for</span> (<spanclass="keywordtype">int</span> i = 0; i < N / 2; ++i) {</div>
<divclass="line"><aname="l00205"></a><spanclass="lineno"> 205</span>  newK[i] = k[i + N / 2];</div>
<divclass="line"><aname="l00206"></a><spanclass="lineno"> 206</span>  newV[i] = v[i + N / 2];</div>
<divclass="line"><aname="l00212"></a><spanclass="lineno"> 212</span> <spanclass="preprocessor"></span><spanclass="keywordflow">for</span> (<spanclass="keywordtype">int</span> i = 0; i < N / 2; ++i) {</div>
<divclass="line"><aname="l00213"></a><spanclass="lineno"> 213</span>  k[i + N / 2] = newK[i];</div>
<divclass="line"><aname="l00214"></a><spanclass="lineno"> 214</span>  v[i + N / 2] = newV[i];</div>
<divclass="line"><aname="l00226"></a><spanclass="lineno"><aclass="line"href="structfaiss_1_1gpu_1_1BitonicMergeStep_3_01K_00_01V_00_01N_00_01Dir_00_01Comp_00_01true_00_01false_01_4.html"> 226</a></span> <spanclass="keyword">struct </span><aclass="code"href="structfaiss_1_1gpu_1_1BitonicMergeStep.html">BitonicMergeStep</a><K, V, N, Dir, Comp, true, false> {</div>
<divclass="line"><aname="l00227"></a><spanclass="lineno"> 227</span> <spanclass="keyword">static</span><spanclass="keyword">inline</span> __device__ <spanclass="keywordtype">void</span> merge(K k[N], V v[N]) {</div>
<divclass="line"><aname="l00228"></a><spanclass="lineno"> 228</span>  static_assert(!utils::isPowerOf2(N), <spanclass="stringliteral">"must be non-power-of-2"</span>);</div>
<divclass="line"><aname="l00229"></a><spanclass="lineno"> 229</span>  static_assert(N >= 3, <spanclass="stringliteral">"must be N >= 3"</span>);</div>
<divclass="line"><aname="l00234"></a><spanclass="lineno"> 234</span> <spanclass="preprocessor"></span><spanclass="keywordflow">for</span> (<spanclass="keywordtype">int</span> i = 0; i < N - kNextHighestPowerOf2 / 2; ++i) {</div>
<divclass="line"><aname="l00235"></a><spanclass="lineno"> 235</span>  K& ka = k[i];</div>
<divclass="line"><aname="l00236"></a><spanclass="lineno"> 236</span>  V& va = v[i];</div>
<divclass="line"><aname="l00260"></a><spanclass="lineno"> 260</span> <spanclass="comment">// FIXME: compiler doesn't like this expression? compiler bug?</span></div>
<divclass="line"><aname="l00285"></a><spanclass="lineno"> 285</span> <spanclass="comment">// FIXME: compiler doesn't like this expression? compiler bug?</span></div>
<divclass="line"><aname="l00302"></a><spanclass="lineno"><aclass="line"href="structfaiss_1_1gpu_1_1BitonicMergeStep_3_01K_00_01V_00_01N_00_01Dir_00_01Comp_00_01false_00_01false_01_4.html"> 302</a></span> <spanclass="keyword">struct </span><aclass="code"href="structfaiss_1_1gpu_1_1BitonicMergeStep.html">BitonicMergeStep</a><K, V, N, Dir, Comp, false, false> {</div>
<divclass="line"><aname="l00303"></a><spanclass="lineno"> 303</span> <spanclass="keyword">static</span><spanclass="keyword">inline</span> __device__ <spanclass="keywordtype">void</span> merge(K k[N], V v[N]) {</div>
<divclass="line"><aname="l00304"></a><spanclass="lineno"> 304</span>  static_assert(!utils::isPowerOf2(N), <spanclass="stringliteral">"must be non-power-of-2"</span>);</div>
<divclass="line"><aname="l00305"></a><spanclass="lineno"> 305</span>  static_assert(N >= 3, <spanclass="stringliteral">"must be N >= 3"</span>);</div>
<divclass="line"><aname="l00310"></a><spanclass="lineno"> 310</span> <spanclass="preprocessor"></span><spanclass="keywordflow">for</span> (<spanclass="keywordtype">int</span> i = 0; i < N - kNextHighestPowerOf2 / 2; ++i) {</div>
<divclass="line"><aname="l00311"></a><spanclass="lineno"> 311</span>  K& ka = k[i];</div>
<divclass="line"><aname="l00312"></a><spanclass="lineno"> 312</span>  V& va = v[i];</div>
<divclass="line"><aname="l00336"></a><spanclass="lineno"> 336</span> <spanclass="comment">// FIXME: compiler doesn't like this expression? compiler bug?</span></div>
<divclass="line"><aname="l00361"></a><spanclass="lineno"> 361</span> <spanclass="comment">// FIXME: compiler doesn't like this expression? compiler bug?</span></div>
<divclass="line"><aname="l00376"></a><spanclass="lineno"> 376</span> <spanclass="comment">/// Merges two sets of registers across the warp of any size;</span></div>
<divclass="line"><aname="l00377"></a><spanclass="lineno"> 377</span> <spanclass="comment">/// i.e., merges a sorted k/v list of size kWarpSize * N1 with a</span></div>
<divclass="line"><aname="l00378"></a><spanclass="lineno"> 378</span> <spanclass="comment">/// sorted k/v list of size kWarpSize * N2, where N1 and N2 are any</span></div>
<divclass="line"><aname="l00379"></a><spanclass="lineno"> 379</span> <spanclass="comment">/// value >= 1</span></div>
<divclass="line"><aname="l00411"></a><spanclass="lineno"> 411</span> <spanclass="comment">// ka is always first in the list, so we needn't use our lane</span></div>
<divclass="line"><aname="l00412"></a><spanclass="lineno"> 412</span> <spanclass="comment">// in this comparison</span></div>
<divclass="line"><aname="l00417"></a><spanclass="lineno"> 417</span> <spanclass="comment">// kb is always second in the list, so we needn't use our lane</span></div>
<divclass="line"><aname="l00418"></a><spanclass="lineno"> 418</span> <spanclass="comment">// in this comparison</span></div>
<divclass="line"><aname="l00425"></a><spanclass="lineno"> 425</span> <spanclass="comment">// We don't care about updating elements in the second list</span></div>
<divclass="line"><aname="l00432"></a><spanclass="lineno"> 432</span> <spanclass="comment">// Only if we care about N2 do we need to bother merging it fully</span></div>
<divclass="line"><aname="l00438"></a><spanclass="lineno"> 438</span> <spanclass="comment">// Recursive template that uses the above bitonic merge to perform a</span></div>
<divclass="line"><aname="l00442"></a><spanclass="lineno"> 442</span> <spanclass="keyword">static</span><spanclass="keyword">inline</span> __device__ <spanclass="keywordtype">void</span> sort(K k[N], V v[N]) {</div>
<divclass="line"><aname="l00443"></a><spanclass="lineno"> 443</span>  static_assert(N > 1, <spanclass="stringliteral">"did not hit specialized case"</span>);</div>
<divclass="line"><aname="l00491"></a><spanclass="lineno"> 491</span> <spanclass="keyword">static</span><spanclass="keyword">inline</span> __device__ <spanclass="keywordtype">void</span> sort(K k[1], V v[1]) {</div>
<divclass="line"><aname="l00492"></a><spanclass="lineno"> 492</span> <spanclass="comment">// Update this code if this changes</span></div>
<divclass="line"><aname="l00493"></a><spanclass="lineno"> 493</span> <spanclass="comment">// should go from 1 -> kWarpSize in multiples of 2</span></div>
<divclass="line"><aname="l00504"></a><spanclass="lineno"> 504</span> <spanclass="comment">/// Sort a list of kWarpSize * N elements in registers, where N is an</span></div>