<divclass="line"><aname="l00002"></a><spanclass="lineno"> 2</span> <spanclass="comment"> * Copyright (c) Facebook, Inc. and its affiliates.</span></div>
<divclass="line"><aname="l00004"></a><spanclass="lineno"> 4</span> <spanclass="comment"> * This source code is licensed under the MIT license found in the</span></div>
<divclass="line"><aname="l00005"></a><spanclass="lineno"> 5</span> <spanclass="comment"> * LICENSE file in the root directory of this source tree.</span></div>
<divclass="line"><aname="l00021"></a><spanclass="lineno"> 21</span> <spanclass="comment">// -perform bitonic merges on pairs of sorted lists, held in</span></div>
<divclass="line"><aname="l00022"></a><spanclass="lineno"> 22</span> <spanclass="comment">// registers. Each list contains N * kWarpSize (multiple of 32)</span></div>
<divclass="line"><aname="l00023"></a><spanclass="lineno"> 23</span> <spanclass="comment">// elements for some N.</span></div>
<divclass="line"><aname="l00024"></a><spanclass="lineno"> 24</span> <spanclass="comment">// The bitonic merge is implemented for arbitrary sizes;</span></div>
<divclass="line"><aname="l00025"></a><spanclass="lineno"> 25</span> <spanclass="comment">// sorted list A of size N1 * kWarpSize registers</span></div>
<divclass="line"><aname="l00026"></a><spanclass="lineno"> 26</span> <spanclass="comment">// sorted list B of size N2 * kWarpSize registers =></span></div>
<divclass="line"><aname="l00027"></a><spanclass="lineno"> 27</span> <spanclass="comment">// sorted list C if size (N1 + N2) * kWarpSize registers. N1 and N2</span></div>
<divclass="line"><aname="l00028"></a><spanclass="lineno"> 28</span> <spanclass="comment">// are >= 1 and don't have to be powers of 2.</span></div>
<divclass="line"><aname="l00030"></a><spanclass="lineno"> 30</span> <spanclass="comment">// -perform bitonic sorts on a set of N * kWarpSize key/value pairs</span></div>
<divclass="line"><aname="l00031"></a><spanclass="lineno"> 31</span> <spanclass="comment">// held in registers, by using the above bitonic merge as a</span></div>
<divclass="line"><aname="l00033"></a><spanclass="lineno"> 33</span> <spanclass="comment">// N can be an arbitrary N >= 1; i.e., the bitonic sort here supports</span></div>
<divclass="line"><aname="l00034"></a><spanclass="lineno"> 34</span> <spanclass="comment">// odd sizes and doesn't require the input to be a power of 2.</span></div>
<divclass="line"><aname="l00036"></a><spanclass="lineno"> 36</span> <spanclass="comment">// The sort or merge network is completely statically instantiated via</span></div>
<divclass="line"><aname="l00037"></a><spanclass="lineno"> 37</span> <spanclass="comment">// template specialization / expansion and constexpr, and it uses warp</span></div>
<divclass="line"><aname="l00038"></a><spanclass="lineno"> 38</span> <spanclass="comment">// shuffles to exchange values between warp lanes.</span></div>
<divclass="line"><aname="l00042"></a><spanclass="lineno"> 42</span> <spanclass="comment">// For a sorting network of keys only, we only need one</span></div>
<divclass="line"><aname="l00043"></a><spanclass="lineno"> 43</span> <spanclass="comment">// comparison (a < b). However, what we really need to know is</span></div>
<divclass="line"><aname="l00044"></a><spanclass="lineno"> 44</span> <spanclass="comment">// if one lane chooses to exchange a value, then the</span></div>
<divclass="line"><aname="l00045"></a><spanclass="lineno"> 45</span> <spanclass="comment">// corresponding lane should also do the exchange.</span></div>
<divclass="line"><aname="l00046"></a><spanclass="lineno"> 46</span> <spanclass="comment">// Thus, if one just uses the negation !(x < y) in the higher</span></div>
<divclass="line"><aname="l00047"></a><spanclass="lineno"> 47</span> <spanclass="comment">// lane, this will also include the case where (x == y). Thus, one</span></div>
<divclass="line"><aname="l00048"></a><spanclass="lineno"> 48</span> <spanclass="comment">// lane in fact performs an exchange and the other doesn't, but</span></div>
<divclass="line"><aname="l00049"></a><spanclass="lineno"> 49</span> <spanclass="comment">// because the only value being exchanged is equivalent, nothing has</span></div>
<divclass="line"><aname="l00051"></a><spanclass="lineno"> 51</span> <spanclass="comment">// So, you can get away with just one comparison and its negation.</span></div>
<divclass="line"><aname="l00053"></a><spanclass="lineno"> 53</span> <spanclass="comment">// If we're sorting keys and values, where equivalent keys can</span></div>
<divclass="line"><aname="l00054"></a><spanclass="lineno"> 54</span> <spanclass="comment">// exist, then this is a problem, since we want to treat (x, v1)</span></div>
<divclass="line"><aname="l00055"></a><spanclass="lineno"> 55</span> <spanclass="comment">// as not equivalent to (x, v2).</span></div>
<divclass="line"><aname="l00057"></a><spanclass="lineno"> 57</span> <spanclass="comment">// To remedy this, you can either compare with a lexicographic</span></div>
<divclass="line"><aname="l00059"></a><spanclass="lineno"> 59</span> <spanclass="comment">// we're predicating all of the choices results in 3 comparisons</span></div>
<divclass="line"><aname="l00060"></a><spanclass="lineno"> 60</span> <spanclass="comment">// being executed, or we can invert the selection so that there is no</span></div>
<divclass="line"><aname="l00061"></a><spanclass="lineno"> 61</span> <spanclass="comment">// middle choice of equality; the other lane will likewise</span></div>
<divclass="line"><aname="l00062"></a><spanclass="lineno"> 62</span> <spanclass="comment">// check that (b.k > a.k) (the higher lane has the values</span></div>
<divclass="line"><aname="l00063"></a><spanclass="lineno"> 63</span> <spanclass="comment">// swapped). Then, the first lane swaps if and only if the</span></div>
<divclass="line"><aname="l00064"></a><spanclass="lineno"> 64</span> <spanclass="comment">// second lane swaps; if both lanes have equivalent keys, no</span></div>
<divclass="line"><aname="l00065"></a><spanclass="lineno"> 65</span> <spanclass="comment">// swap will be performed. This results in only two comparisons</span></div>
<divclass="line"><aname="l00066"></a><spanclass="lineno"> 66</span> <spanclass="comment">// being executed.</span></div>
<divclass="line"><aname="l00068"></a><spanclass="lineno"> 68</span> <spanclass="comment">// If you don't consider values as well, then this does not produce a</span></div>
<divclass="line"><aname="l00069"></a><spanclass="lineno"> 69</span> <spanclass="comment">// consistent ordering among (k, v) pairs with equivalent keys but</span></div>
<divclass="line"><aname="l00070"></a><spanclass="lineno"> 70</span> <spanclass="comment">// different values; for us, we don't really care about ordering or</span></div>
<divclass="line"><aname="l00073"></a><spanclass="lineno"> 73</span> <spanclass="comment">// I have tried both re-arranging the order in the higher lane to get</span></div>
<divclass="line"><aname="l00074"></a><spanclass="lineno"> 74</span> <spanclass="comment">// away with one comparison or adding the value to the check; both</span></div>
<divclass="line"><aname="l00075"></a><spanclass="lineno"> 75</span> <spanclass="comment">// result in greater register consumption or lower speed than just</span></div>
<divclass="line"><aname="l00076"></a><spanclass="lineno"> 76</span> <spanclass="comment">// perfoming both < and > comparisons with the variables, so I just</span></div>
<divclass="line"><aname="l00077"></a><spanclass="lineno"> 77</span> <spanclass="comment">// stick with this.</span></div>
<divclass="line"><aname="l00079"></a><spanclass="lineno"> 79</span> <spanclass="comment">// This function merges kWarpSize / 2L lists in parallel using warp</span></div>
<divclass="line"><aname="l00081"></a><spanclass="lineno"> 81</span> <spanclass="comment">// It works on at most size-16 lists, as we need 32 threads for this</span></div>
<divclass="line"><aname="l00084"></a><spanclass="lineno"> 84</span> <spanclass="comment">// If IsBitonic is false, the first stage is reversed, so we don't</span></div>
<divclass="line"><aname="l00085"></a><spanclass="lineno"> 85</span> <spanclass="comment">// need to sort directionally. It's still technically a bitonic sort.</span></div>
<divclass="line"><aname="l00089"></a><spanclass="lineno"> 89</span>  static_assert(utils::isPowerOf2(L), <spanclass="stringliteral">"L must be a power-of-2"</span>);</div>
<divclass="line"><aname="l00090"></a><spanclass="lineno"> 90</span>  static_assert(L <= kWarpSize / 2, <spanclass="stringliteral">"merge list size must be <= 16"</span>);</div>
<divclass="line"><aname="l00095"></a><spanclass="lineno"> 95</span> <spanclass="comment">// Reverse the first comparison stage.</span></div>
<divclass="line"><aname="l00096"></a><spanclass="lineno"> 96</span> <spanclass="comment">// For example, merging a list of size 8 has the exchanges:</span></div>
<divclass="line"><aname="l00101"></a><spanclass="lineno"> 101</span> <spanclass="comment">// Whether we are the lesser thread in the exchange</span></div>
<divclass="line"><aname="l00102"></a><spanclass="lineno"> 102</span> <spanclass="keywordtype">bool</span> small = !(laneId & L);</div>
<divclass="line"><aname="l00105"></a><spanclass="lineno"> 105</span> <spanclass="comment">// See the comment above how performing both of these</span></div>
<divclass="line"><aname="l00106"></a><spanclass="lineno"> 106</span> <spanclass="comment">// comparisons in the warp seems to win out over the</span></div>
<divclass="line"><aname="l00107"></a><spanclass="lineno"> 107</span> <spanclass="comment">// alternatives in practice</span></div>
<divclass="line"><aname="l00108"></a><spanclass="lineno"> 108</span> <spanclass="keywordtype">bool</span> s = small ? Comp::gt(k, otherK) : Comp::lt(k, otherK);</div>
<divclass="line"><aname="l00124"></a><spanclass="lineno"> 124</span> <spanclass="comment">// Whether we are the lesser thread in the exchange</span></div>
<divclass="line"><aname="l00125"></a><spanclass="lineno"> 125</span> <spanclass="keywordtype">bool</span> small = !(laneId & stride);</div>
<divclass="line"><aname="l00140"></a><spanclass="lineno"> 140</span> <spanclass="comment">// Template for performing a bitonic merge of an arbitrary set of</span></div>
<divclass="line"><aname="l00161"></a><spanclass="lineno"><aclass="line"href="structfaiss_1_1gpu_1_1BitonicMergeStep_3_01K_00_01V_00_01N_00_01Dir_00_01Comp_00_01Low_00_01true_01_4.html"> 161</a></span> <spanclass="keyword">struct </span><aclass="code"href="structfaiss_1_1gpu_1_1BitonicMergeStep.html">BitonicMergeStep</a><K, V, N, Dir, Comp, Low, true> {</div>
<divclass="line"><aname="l00162"></a><spanclass="lineno"> 162</span> <spanclass="keyword">static</span><spanclass="keyword">inline</span> __device__ <spanclass="keywordtype">void</span> merge(K k[N], V v[N]) {</div>
<divclass="line"><aname="l00163"></a><spanclass="lineno"> 163</span>  static_assert(utils::isPowerOf2(N), <spanclass="stringliteral">"must be power of 2"</span>);</div>
<divclass="line"><aname="l00164"></a><spanclass="lineno"> 164</span>  static_assert(N > 1, <spanclass="stringliteral">"must be N > 1"</span>);</div>
<divclass="line"><aname="l00167"></a><spanclass="lineno"> 167</span> <spanclass="preprocessor"></span><spanclass="keywordflow">for</span> (<spanclass="keywordtype">int</span> i = 0; i < N / 2; ++i) {</div>
<divclass="line"><aname="l00168"></a><spanclass="lineno"> 168</span>  K& ka = k[i];</div>
<divclass="line"><aname="l00169"></a><spanclass="lineno"> 169</span>  V& va = v[i];</div>
<divclass="line"><aname="l00184"></a><spanclass="lineno"> 184</span> <spanclass="preprocessor"></span><spanclass="keywordflow">for</span> (<spanclass="keywordtype">int</span> i = 0; i < N / 2; ++i) {</div>
<divclass="line"><aname="l00192"></a><spanclass="lineno"> 192</span> <spanclass="preprocessor"></span><spanclass="keywordflow">for</span> (<spanclass="keywordtype">int</span> i = 0; i < N / 2; ++i) {</div>
<divclass="line"><aname="l00203"></a><spanclass="lineno"> 203</span> <spanclass="preprocessor"></span><spanclass="keywordflow">for</span> (<spanclass="keywordtype">int</span> i = 0; i < N / 2; ++i) {</div>
<divclass="line"><aname="l00204"></a><spanclass="lineno"> 204</span>  newK[i] = k[i + N / 2];</div>
<divclass="line"><aname="l00205"></a><spanclass="lineno"> 205</span>  newV[i] = v[i + N / 2];</div>
<divclass="line"><aname="l00211"></a><spanclass="lineno"> 211</span> <spanclass="preprocessor"></span><spanclass="keywordflow">for</span> (<spanclass="keywordtype">int</span> i = 0; i < N / 2; ++i) {</div>
<divclass="line"><aname="l00212"></a><spanclass="lineno"> 212</span>  k[i + N / 2] = newK[i];</div>
<divclass="line"><aname="l00213"></a><spanclass="lineno"> 213</span>  v[i + N / 2] = newV[i];</div>
<divclass="line"><aname="l00225"></a><spanclass="lineno"><aclass="line"href="structfaiss_1_1gpu_1_1BitonicMergeStep_3_01K_00_01V_00_01N_00_01Dir_00_01Comp_00_01true_00_01false_01_4.html"> 225</a></span> <spanclass="keyword">struct </span><aclass="code"href="structfaiss_1_1gpu_1_1BitonicMergeStep.html">BitonicMergeStep</a><K, V, N, Dir, Comp, true, false> {</div>
<divclass="line"><aname="l00226"></a><spanclass="lineno"> 226</span> <spanclass="keyword">static</span><spanclass="keyword">inline</span> __device__ <spanclass="keywordtype">void</span> merge(K k[N], V v[N]) {</div>
<divclass="line"><aname="l00227"></a><spanclass="lineno"> 227</span>  static_assert(!utils::isPowerOf2(N), <spanclass="stringliteral">"must be non-power-of-2"</span>);</div>
<divclass="line"><aname="l00228"></a><spanclass="lineno"> 228</span>  static_assert(N >= 3, <spanclass="stringliteral">"must be N >= 3"</span>);</div>
<divclass="line"><aname="l00233"></a><spanclass="lineno"> 233</span> <spanclass="preprocessor"></span><spanclass="keywordflow">for</span> (<spanclass="keywordtype">int</span> i = 0; i < N - kNextHighestPowerOf2 / 2; ++i) {</div>
<divclass="line"><aname="l00234"></a><spanclass="lineno"> 234</span>  K& ka = k[i];</div>
<divclass="line"><aname="l00235"></a><spanclass="lineno"> 235</span>  V& va = v[i];</div>
<divclass="line"><aname="l00259"></a><spanclass="lineno"> 259</span> <spanclass="comment">// FIXME: compiler doesn't like this expression? compiler bug?</span></div>
<divclass="line"><aname="l00284"></a><spanclass="lineno"> 284</span> <spanclass="comment">// FIXME: compiler doesn't like this expression? compiler bug?</span></div>
<divclass="line"><aname="l00301"></a><spanclass="lineno"><aclass="line"href="structfaiss_1_1gpu_1_1BitonicMergeStep_3_01K_00_01V_00_01N_00_01Dir_00_01Comp_00_01false_00_01false_01_4.html"> 301</a></span> <spanclass="keyword">struct </span><aclass="code"href="structfaiss_1_1gpu_1_1BitonicMergeStep.html">BitonicMergeStep</a><K, V, N, Dir, Comp, false, false> {</div>
<divclass="line"><aname="l00302"></a><spanclass="lineno"> 302</span> <spanclass="keyword">static</span><spanclass="keyword">inline</span> __device__ <spanclass="keywordtype">void</span> merge(K k[N], V v[N]) {</div>
<divclass="line"><aname="l00303"></a><spanclass="lineno"> 303</span>  static_assert(!utils::isPowerOf2(N), <spanclass="stringliteral">"must be non-power-of-2"</span>);</div>
<divclass="line"><aname="l00304"></a><spanclass="lineno"> 304</span>  static_assert(N >= 3, <spanclass="stringliteral">"must be N >= 3"</span>);</div>
<divclass="line"><aname="l00309"></a><spanclass="lineno"> 309</span> <spanclass="preprocessor"></span><spanclass="keywordflow">for</span> (<spanclass="keywordtype">int</span> i = 0; i < N - kNextHighestPowerOf2 / 2; ++i) {</div>
<divclass="line"><aname="l00310"></a><spanclass="lineno"> 310</span>  K& ka = k[i];</div>
<divclass="line"><aname="l00311"></a><spanclass="lineno"> 311</span>  V& va = v[i];</div>
<divclass="line"><aname="l00335"></a><spanclass="lineno"> 335</span> <spanclass="comment">// FIXME: compiler doesn't like this expression? compiler bug?</span></div>
<divclass="line"><aname="l00360"></a><spanclass="lineno"> 360</span> <spanclass="comment">// FIXME: compiler doesn't like this expression? compiler bug?</span></div>
<divclass="line"><aname="l00375"></a><spanclass="lineno"> 375</span> <spanclass="comment">/// Merges two sets of registers across the warp of any size;</span></div>
<divclass="line"><aname="l00376"></a><spanclass="lineno"> 376</span> <spanclass="comment">/// i.e., merges a sorted k/v list of size kWarpSize * N1 with a</span></div>
<divclass="line"><aname="l00377"></a><spanclass="lineno"> 377</span> <spanclass="comment">/// sorted k/v list of size kWarpSize * N2, where N1 and N2 are any</span></div>
<divclass="line"><aname="l00378"></a><spanclass="lineno"> 378</span> <spanclass="comment">/// value >= 1</span></div>
<divclass="line"><aname="l00410"></a><spanclass="lineno"> 410</span> <spanclass="comment">// ka is always first in the list, so we needn't use our lane</span></div>
<divclass="line"><aname="l00411"></a><spanclass="lineno"> 411</span> <spanclass="comment">// in this comparison</span></div>
<divclass="line"><aname="l00416"></a><spanclass="lineno"> 416</span> <spanclass="comment">// kb is always second in the list, so we needn't use our lane</span></div>
<divclass="line"><aname="l00417"></a><spanclass="lineno"> 417</span> <spanclass="comment">// in this comparison</span></div>
<divclass="line"><aname="l00424"></a><spanclass="lineno"> 424</span> <spanclass="comment">// We don't care about updating elements in the second list</span></div>
<divclass="line"><aname="l00431"></a><spanclass="lineno"> 431</span> <spanclass="comment">// Only if we care about N2 do we need to bother merging it fully</span></div>
<divclass="line"><aname="l00437"></a><spanclass="lineno"> 437</span> <spanclass="comment">// Recursive template that uses the above bitonic merge to perform a</span></div>
<divclass="line"><aname="l00441"></a><spanclass="lineno"> 441</span> <spanclass="keyword">static</span><spanclass="keyword">inline</span> __device__ <spanclass="keywordtype">void</span> sort(K k[N], V v[N]) {</div>
<divclass="line"><aname="l00442"></a><spanclass="lineno"> 442</span>  static_assert(N > 1, <spanclass="stringliteral">"did not hit specialized case"</span>);</div>
<divclass="line"><aname="l00490"></a><spanclass="lineno"> 490</span> <spanclass="keyword">static</span><spanclass="keyword">inline</span> __device__ <spanclass="keywordtype">void</span> sort(K k[1], V v[1]) {</div>
<divclass="line"><aname="l00491"></a><spanclass="lineno"> 491</span> <spanclass="comment">// Update this code if this changes</span></div>
<divclass="line"><aname="l00492"></a><spanclass="lineno"> 492</span> <spanclass="comment">// should go from 1 -> kWarpSize in multiples of 2</span></div>
<divclass="line"><aname="l00503"></a><spanclass="lineno"> 503</span> <spanclass="comment">/// Sort a list of kWarpSize * N elements in registers, where N is an</span></div>