2017-02-23 06:26:44 +08:00
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
< html xmlns = "http://www.w3.org/1999/xhtml" >
< head >
< meta http-equiv = "Content-Type" content = "text/xhtml;charset=UTF-8" / >
< meta http-equiv = "X-UA-Compatible" content = "IE=9" / >
< meta name = "generator" content = "Doxygen 1.8.5" / >
2017-03-21 01:33:21 +08:00
< title > Faiss: /data/users/matthijs/github_faiss/faiss/gpu/impl/IVFFlatScan.cu Source File< / title >
2017-02-23 06:26:44 +08:00
< link href = "tabs.css" rel = "stylesheet" type = "text/css" / >
< script type = "text/javascript" src = "jquery.js" > < / script >
< script type = "text/javascript" src = "dynsections.js" > < / script >
< link href = "search/search.css" rel = "stylesheet" type = "text/css" / >
< script type = "text/javascript" src = "search/search.js" > < / script >
< script type = "text/javascript" >
$(document).ready(function() { searchBox.OnSelectItem(0); });
< / script >
< link href = "doxygen.css" rel = "stylesheet" type = "text/css" / >
< / head >
< body >
< div id = "top" > <!-- do not remove this div, it is closed by doxygen! -->
< div id = "titlearea" >
< table cellspacing = "0" cellpadding = "0" >
< tbody >
< tr style = "height: 56px;" >
< td style = "padding-left: 0.5em;" >
< div id = "projectname" > Faiss
< / div >
< / td >
< / tr >
< / tbody >
< / table >
< / div >
<!-- end header part -->
<!-- Generated by Doxygen 1.8.5 -->
< script type = "text/javascript" >
var searchBox = new SearchBox("searchBox", "search",false,'Search');
< / script >
< div id = "navrow1" class = "tabs" >
< ul class = "tablist" >
< li > < a href = "index.html" > < span > Main  Page< / span > < / a > < / li >
< li > < a href = "namespaces.html" > < span > Namespaces< / span > < / a > < / li >
< li > < a href = "annotated.html" > < span > Classes< / span > < / a > < / li >
< li class = "current" > < a href = "files.html" > < span > Files< / span > < / a > < / li >
< li >
< div id = "MSearchBox" class = "MSearchBoxInactive" >
< span class = "left" >
< img id = "MSearchSelect" src = "search/mag_sel.png"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
alt=""/>
< input type = "text" id = "MSearchField" value = "Search" accesskey = "S"
onfocus="searchBox.OnSearchFieldFocus(true)"
onblur="searchBox.OnSearchFieldFocus(false)"
onkeyup="searchBox.OnSearchFieldChange(event)"/>
< / span > < span class = "right" >
< a id = "MSearchClose" href = "javascript:searchBox.CloseResultsWindow()" > < img id = "MSearchCloseImg" border = "0" src = "search/close.png" alt = "" / > < / a >
< / span >
< / div >
< / li >
< / ul >
< / div >
< div id = "navrow2" class = "tabs2" >
< ul class = "tablist" >
< li > < a href = "files.html" > < span > File  List< / span > < / a > < / li >
< / ul >
< / div >
<!-- window showing the filter options -->
< div id = "MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
< a class = "SelectItem" href = "javascript:void(0)" onclick = "searchBox.OnSelectItem(0)" > < span class = "SelectionMark" >   < / span > All< / a > < a class = "SelectItem" href = "javascript:void(0)" onclick = "searchBox.OnSelectItem(1)" > < span class = "SelectionMark" >   < / span > Classes< / a > < a class = "SelectItem" href = "javascript:void(0)" onclick = "searchBox.OnSelectItem(2)" > < span class = "SelectionMark" >   < / span > Namespaces< / a > < a class = "SelectItem" href = "javascript:void(0)" onclick = "searchBox.OnSelectItem(3)" > < span class = "SelectionMark" >   < / span > Functions< / a > < a class = "SelectItem" href = "javascript:void(0)" onclick = "searchBox.OnSelectItem(4)" > < span class = "SelectionMark" >   < / span > Variables< / a > < a class = "SelectItem" href = "javascript:void(0)" onclick = "searchBox.OnSelectItem(5)" > < span class = "SelectionMark" >   < / span > Typedefs< / a > < a class = "SelectItem" href = "javascript:void(0)" onclick = "searchBox.OnSelectItem(6)" > < span class = "SelectionMark" >   < / span > Enumerations< / a > < a class = "SelectItem" href = "javascript:void(0)" onclick = "searchBox.OnSelectItem(7)" > < span class = "SelectionMark" >   < / span > Enumerator< / a > < a class = "SelectItem" href = "javascript:void(0)" onclick = "searchBox.OnSelectItem(8)" > < span class = "SelectionMark" >   < / span > Friends< / a > < / div >
<!-- iframe showing the search results (closed by default) -->
< div id = "MSearchResultsWindow" >
< iframe src = "javascript:void(0)" frameborder = "0"
name="MSearchResults" id="MSearchResults">
< / iframe >
< / div >
< div id = "nav-path" class = "navpath" >
< ul >
< li class = "navelem" > < a class = "el" href = "dir_6b3ae6988449b0834e9596fad5d75199.html" > gpu< / a > < / li > < li class = "navelem" > < a class = "el" href = "dir_49d1182a3b8dfb62757c53ae905481ad.html" > impl< / a > < / li > < / ul >
< / div >
< / div > <!-- top -->
< div class = "header" >
< div class = "headertitle" >
< div class = "title" > IVFFlatScan.cu< / div > < / div >
< / div > <!-- header -->
< div class = "contents" >
2017-06-21 21:54:28 +08:00
< div class = "fragment" > < div class = "line" > < a name = "l00001" > < / a > < span class = "lineno" > 1< / span >   < span class = "comment" > /**< / span > < / div >
< div class = "line" > < a name = "l00002" > < / a > < span class = "lineno" > 2< / span >   < span class = "comment" > * Copyright (c) 2015-present, Facebook, Inc.< / span > < / div >
< div class = "line" > < a name = "l00003" > < / a > < span class = "lineno" > 3< / span >   < span class = "comment" > * All rights reserved.< / span > < / div >
< div class = "line" > < a name = "l00004" > < / a > < span class = "lineno" > 4< / span >   < span class = "comment" > *< / span > < / div >
2017-07-30 15:18:45 +08:00
< div class = "line" > < a name = "l00005" > < / a > < span class = "lineno" > 5< / span >   < span class = "comment" > * This source code is licensed under the BSD+Patents license found in the< / span > < / div >
2017-06-21 21:54:28 +08:00
< div class = "line" > < a name = "l00006" > < / a > < span class = "lineno" > 6< / span >   < span class = "comment" > * LICENSE file in the root directory of this source tree.< / span > < / div >
< div class = "line" > < a name = "l00007" > < / a > < span class = "lineno" > 7< / span >   < span class = "comment" > */< / span > < / div >
< div class = "line" > < a name = "l00008" > < / a > < span class = "lineno" > 8< / span >   < / div >
< div class = "line" > < a name = "l00009" > < / a > < span class = "lineno" > 9< / span >   < span class = "comment" > // Copyright 2004-present Facebook. All Rights Reserved.< / span > < / div >
< div class = "line" > < a name = "l00010" > < / a > < span class = "lineno" > 10< / span >   < / div >
< div class = "line" > < a name = "l00011" > < / a > < span class = "lineno" > 11< / span >   < span class = "preprocessor" > #include " IVFFlatScan.cuh" < / span > < / div >
< div class = "line" > < a name = "l00012" > < / a > < span class = "lineno" > 12< / span >   < span class = "preprocessor" > #include " ../GpuResources.h" < / span > < / div >
< div class = "line" > < a name = "l00013" > < / a > < span class = "lineno" > 13< / span >   < span class = "preprocessor" > #include " IVFUtils.cuh" < / span > < / div >
< div class = "line" > < a name = "l00014" > < / a > < span class = "lineno" > 14< / span >   < span class = "preprocessor" > #include " ../utils/ConversionOperators.cuh" < / span > < / div >
< div class = "line" > < a name = "l00015" > < / a > < span class = "lineno" > 15< / span >   < span class = "preprocessor" > #include " ../utils/DeviceDefs.cuh" < / span > < / div >
< div class = "line" > < a name = "l00016" > < / a > < span class = "lineno" > 16< / span >   < span class = "preprocessor" > #include " ../utils/DeviceUtils.h" < / span > < / div >
< div class = "line" > < a name = "l00017" > < / a > < span class = "lineno" > 17< / span >   < span class = "preprocessor" > #include " ../utils/DeviceTensor.cuh" < / span > < / div >
< div class = "line" > < a name = "l00018" > < / a > < span class = "lineno" > 18< / span >   < span class = "preprocessor" > #include " ../utils/Float16.cuh" < / span > < / div >
< div class = "line" > < a name = "l00019" > < / a > < span class = "lineno" > 19< / span >   < span class = "preprocessor" > #include " ../utils/MathOperators.cuh" < / span > < / div >
< div class = "line" > < a name = "l00020" > < / a > < span class = "lineno" > 20< / span >   < span class = "preprocessor" > #include " ../utils/LoadStoreOperators.cuh" < / span > < / div >
< div class = "line" > < a name = "l00021" > < / a > < span class = "lineno" > 21< / span >   < span class = "preprocessor" > #include " ../utils/PtxUtils.cuh" < / span > < / div >
< div class = "line" > < a name = "l00022" > < / a > < span class = "lineno" > 22< / span >   < span class = "preprocessor" > #include " ../utils/Reductions.cuh" < / span > < / div >
< div class = "line" > < a name = "l00023" > < / a > < span class = "lineno" > 23< / span >   < span class = "preprocessor" > #include " ../utils/StaticUtils.h" < / span > < / div >
< div class = "line" > < a name = "l00024" > < / a > < span class = "lineno" > 24< / span >   < span class = "preprocessor" > #include < thrust/host_vector.h> < / span > < / div >
< div class = "line" > < a name = "l00025" > < / a > < span class = "lineno" > 25< / span >   < / div >
< div class = "line" > < a name = "l00026" > < / a > < span class = "lineno" > 26< / span >   < span class = "keyword" > namespace < / span > faiss { < span class = "keyword" > namespace < / span > gpu {< / div >
< div class = "line" > < a name = "l00027" > < / a > < span class = "lineno" > 27< / span >   < / div >
< div class = "line" > < a name = "l00028" > < / a > < span class = "lineno" > 28< / span >   < span class = "keyword" > template< / span > < < span class = "keyword" > typename< / span > T> < / div >
< div class = "line" > < a name = "l00029" > < / a > < span class = "lineno" > 29< / span >   < span class = "keyword" > inline< / span > __device__ < span class = "keyword" > typename< / span > Math< T> ::ScalarType l2Distance(T a, T b) {< / div >
< div class = "line" > < a name = "l00030" > < / a > < span class = "lineno" > 30< / span >   a = Math< T> ::sub(a, b);< / div >
< div class = "line" > < a name = "l00031" > < / a > < span class = "lineno" > 31< / span >   a = Math< T> ::mul(a, a);< / div >
< div class = "line" > < a name = "l00032" > < / a > < span class = "lineno" > 32< / span >   < span class = "keywordflow" > return< / span > < a class = "code" href = "structfaiss_1_1gpu_1_1Math.html#a4b17f0b5d014f300e76dde5b24af8014" > Math< T> ::reduceAdd< / a > (a);< / div >
< div class = "line" > < a name = "l00033" > < / a > < span class = "lineno" > 33< / span >   }< / div >
< div class = "line" > < a name = "l00034" > < / a > < span class = "lineno" > 34< / span >   < / div >
< div class = "line" > < a name = "l00035" > < / a > < span class = "lineno" > 35< / span >   < span class = "keyword" > template< / span > < < span class = "keyword" > typename< / span > T> < / div >
< div class = "line" > < a name = "l00036" > < / a > < span class = "lineno" > 36< / span >   < span class = "keyword" > inline< / span > __device__ < span class = "keyword" > typename< / span > Math< T> ::ScalarType ipDistance(T a, T b) {< / div >
< div class = "line" > < a name = "l00037" > < / a > < span class = "lineno" > 37< / span >   < span class = "keywordflow" > return< / span > < a class = "code" href = "structfaiss_1_1gpu_1_1Math.html#a4b17f0b5d014f300e76dde5b24af8014" > Math< T> ::reduceAdd< / a > (Math< T> ::mul(a, b));< / div >
< div class = "line" > < a name = "l00038" > < / a > < span class = "lineno" > 38< / span >   }< / div >
< div class = "line" > < a name = "l00039" > < / a > < span class = "lineno" > 39< / span >   < / div >
< div class = "line" > < a name = "l00040" > < / a > < span class = "lineno" > 40< / span >   < span class = "comment" > // For list scanning, even if the input data is `half`, we perform all< / span > < / div >
< div class = "line" > < a name = "l00041" > < / a > < span class = "lineno" > 41< / span >   < span class = "comment" > // math in float32, because the code is memory b/w bound, and the< / span > < / div >
< div class = "line" > < a name = "l00042" > < / a > < span class = "lineno" > 42< / span >   < span class = "comment" > // added precision for accumulation is useful< / span > < / div >
< div class = "line" > < a name = "l00043" > < / a > < span class = "lineno" > 43< / span >   < span class = "comment" > < / span > < / div >
< div class = "line" > < a name = "l00044" > < / a > < span class = "lineno" > 44< / span >   < span class = "comment" > /// The class that we use to provide scan specializations< / span > < / div >
< div class = "line" > < a name = "l00045" > < / a > < span class = "lineno" > 45< / span >   < span class = "comment" > < / span > < span class = "keyword" > template< / span > < < span class = "keywordtype" > int< / span > Dims, < span class = "keywordtype" > bool< / span > L2, < span class = "keyword" > typename< / span > T> < / div >
< div class = "line" > < a name = "l00046" > < / a > < span class = "lineno" > < a class = "line" href = "structfaiss_1_1gpu_1_1IVFFlatScan.html" > 46< / a > < / span >   < span class = "keyword" > struct < / span > < a class = "code" href = "structfaiss_1_1gpu_1_1IVFFlatScan.html" > IVFFlatScan< / a > {< / div >
< div class = "line" > < a name = "l00047" > < / a > < span class = "lineno" > 47< / span >   };< / div >
< div class = "line" > < a name = "l00048" > < / a > < span class = "lineno" > 48< / span >   < / div >
< div class = "line" > < a name = "l00049" > < / a > < span class = "lineno" > 49< / span >   < span class = "comment" > // Fallback implementation: works for any dimension size< / span > < / div >
< div class = "line" > < a name = "l00050" > < / a > < span class = "lineno" > 50< / span >   < span class = "keyword" > template< / span > < < span class = "keywordtype" > bool< / span > L2, < span class = "keyword" > typename< / span > T> < / div >
< div class = "line" > < a name = "l00051" > < / a > < span class = "lineno" > < a class = "line" href = "structfaiss_1_1gpu_1_1IVFFlatScan_3-1_00_01L2_00_01T_01_4.html" > 51< / a > < / span >   < span class = "keyword" > struct < / span > < a class = "code" href = "structfaiss_1_1gpu_1_1IVFFlatScan.html" > IVFFlatScan< / a > < -1, L2, T> {< / div >
< div class = "line" > < a name = "l00052" > < / a > < span class = "lineno" > 52< / span >   < span class = "keyword" > static< / span > __device__ < span class = "keywordtype" > void< / span > scan(< span class = "keywordtype" > float< / span > * query,< / div >
< div class = "line" > < a name = "l00053" > < / a > < span class = "lineno" > 53< / span >   < span class = "keywordtype" > void< / span > * vecData,< / div >
< div class = "line" > < a name = "l00054" > < / a > < span class = "lineno" > 54< / span >   < span class = "keywordtype" > int< / span > numVecs,< / div >
< div class = "line" > < a name = "l00055" > < / a > < span class = "lineno" > 55< / span >   < span class = "keywordtype" > int< / span > dim,< / div >
< div class = "line" > < a name = "l00056" > < / a > < span class = "lineno" > 56< / span >   < span class = "keywordtype" > float< / span > * distanceOut) {< / div >
< div class = "line" > < a name = "l00057" > < / a > < span class = "lineno" > 57< / span >   < span class = "keyword" > extern< / span > __shared__ < span class = "keywordtype" > float< / span > smem[];< / div >
< div class = "line" > < a name = "l00058" > < / a > < span class = "lineno" > 58< / span >   T* vecs = (T*) vecData;< / div >
< div class = "line" > < a name = "l00059" > < / a > < span class = "lineno" > 59< / span >   < / div >
< div class = "line" > < a name = "l00060" > < / a > < span class = "lineno" > 60< / span >   < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > vec = 0; vec < numVecs; ++vec) {< / div >
< div class = "line" > < a name = "l00061" > < / a > < span class = "lineno" > 61< / span >   < span class = "comment" > // Reduce in dist< / span > < / div >
< div class = "line" > < a name = "l00062" > < / a > < span class = "lineno" > 62< / span >   < span class = "keywordtype" > float< / span > dist = 0.0f;< / div >
< div class = "line" > < a name = "l00063" > < / a > < span class = "lineno" > 63< / span >   < / div >
< div class = "line" > < a name = "l00064" > < / a > < span class = "lineno" > 64< / span >   < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > d = threadIdx.x; d < dim; d += blockDim.x) {< / div >
< div class = "line" > < a name = "l00065" > < / a > < span class = "lineno" > 65< / span >   < span class = "keywordtype" > float< / span > vecVal = < a class = "code" href = "structfaiss_1_1gpu_1_1ConvertTo.html" > ConvertTo< float> ::to< / a > (vecs[vec * dim + d]);< / div >
< div class = "line" > < a name = "l00066" > < / a > < span class = "lineno" > 66< / span >   < span class = "keywordtype" > float< / span > queryVal = query[d];< / div >
< div class = "line" > < a name = "l00067" > < / a > < span class = "lineno" > 67< / span >   < span class = "keywordtype" > float< / span > curDist;< / div >
< div class = "line" > < a name = "l00068" > < / a > < span class = "lineno" > 68< / span >   < / div >
< div class = "line" > < a name = "l00069" > < / a > < span class = "lineno" > 69< / span >   < span class = "keywordflow" > if< / span > (L2) {< / div >
< div class = "line" > < a name = "l00070" > < / a > < span class = "lineno" > 70< / span >   curDist = l2Distance(queryVal, vecVal);< / div >
< div class = "line" > < a name = "l00071" > < / a > < span class = "lineno" > 71< / span >   } < span class = "keywordflow" > else< / span > {< / div >
< div class = "line" > < a name = "l00072" > < / a > < span class = "lineno" > 72< / span >   curDist = ipDistance(queryVal, vecVal);< / div >
< div class = "line" > < a name = "l00073" > < / a > < span class = "lineno" > 73< / span >   }< / div >
< div class = "line" > < a name = "l00074" > < / a > < span class = "lineno" > 74< / span >   < / div >
< div class = "line" > < a name = "l00075" > < / a > < span class = "lineno" > 75< / span >   dist += curDist;< / div >
< div class = "line" > < a name = "l00076" > < / a > < span class = "lineno" > 76< / span >   }< / div >
< div class = "line" > < a name = "l00077" > < / a > < span class = "lineno" > 77< / span >   < / div >
< div class = "line" > < a name = "l00078" > < / a > < span class = "lineno" > 78< / span >   < span class = "comment" > // Reduce distance within block< / span > < / div >
< div class = "line" > < a name = "l00079" > < / a > < span class = "lineno" > 79< / span >   dist = blockReduceAllSum< float, false, true> (dist, smem);< / div >
< div class = "line" > < a name = "l00080" > < / a > < span class = "lineno" > 80< / span >   < / div >
< div class = "line" > < a name = "l00081" > < / a > < span class = "lineno" > 81< / span >   < span class = "keywordflow" > if< / span > (threadIdx.x == 0) {< / div >
< div class = "line" > < a name = "l00082" > < / a > < span class = "lineno" > 82< / span >   distanceOut[vec] = dist;< / div >
< div class = "line" > < a name = "l00083" > < / a > < span class = "lineno" > 83< / span >   }< / div >
< div class = "line" > < a name = "l00084" > < / a > < span class = "lineno" > 84< / span >   }< / div >
< div class = "line" > < a name = "l00085" > < / a > < span class = "lineno" > 85< / span >   }< / div >
< div class = "line" > < a name = "l00086" > < / a > < span class = "lineno" > 86< / span >   };< / div >
< div class = "line" > < a name = "l00087" > < / a > < span class = "lineno" > 87< / span >   < / div >
< div class = "line" > < a name = "l00088" > < / a > < span class = "lineno" > 88< / span >   < span class = "comment" > // implementation: works for # dims == blockDim.x< / span > < / div >
< div class = "line" > < a name = "l00089" > < / a > < span class = "lineno" > 89< / span >   < span class = "keyword" > template< / span > < < span class = "keywordtype" > bool< / span > L2, < span class = "keyword" > typename< / span > T> < / div >
< div class = "line" > < a name = "l00090" > < / a > < span class = "lineno" > < a class = "line" href = "structfaiss_1_1gpu_1_1IVFFlatScan_3_010_00_01L2_00_01T_01_4.html" > 90< / a > < / span >   < span class = "keyword" > struct < / span > < a class = "code" href = "structfaiss_1_1gpu_1_1IVFFlatScan.html" > IVFFlatScan< / a > < 0, L2, T> {< / div >
< div class = "line" > < a name = "l00091" > < / a > < span class = "lineno" > 91< / span >   < span class = "keyword" > static< / span > __device__ < span class = "keywordtype" > void< / span > scan(< span class = "keywordtype" > float< / span > * query,< / div >
< div class = "line" > < a name = "l00092" > < / a > < span class = "lineno" > 92< / span >   < span class = "keywordtype" > void< / span > * vecData,< / div >
< div class = "line" > < a name = "l00093" > < / a > < span class = "lineno" > 93< / span >   < span class = "keywordtype" > int< / span > numVecs,< / div >
< div class = "line" > < a name = "l00094" > < / a > < span class = "lineno" > 94< / span >   < span class = "keywordtype" > int< / span > dim,< / div >
< div class = "line" > < a name = "l00095" > < / a > < span class = "lineno" > 95< / span >   < span class = "keywordtype" > float< / span > * distanceOut) {< / div >
< div class = "line" > < a name = "l00096" > < / a > < span class = "lineno" > 96< / span >   < span class = "keyword" > extern< / span > __shared__ < span class = "keywordtype" > float< / span > smem[];< / div >
< div class = "line" > < a name = "l00097" > < / a > < span class = "lineno" > 97< / span >   T* vecs = (T*) vecData;< / div >
< div class = "line" > < a name = "l00098" > < / a > < span class = "lineno" > 98< / span >   < / div >
< div class = "line" > < a name = "l00099" > < / a > < span class = "lineno" > 99< / span >   < span class = "keywordtype" > float< / span > queryVal = query[threadIdx.x];< / div >
< div class = "line" > < a name = "l00100" > < / a > < span class = "lineno" > 100< / span >   < / div >
< div class = "line" > < a name = "l00101" > < / a > < span class = "lineno" > 101< / span >   constexpr < span class = "keywordtype" > int< / span > kUnroll = 4;< / div >
< div class = "line" > < a name = "l00102" > < / a > < span class = "lineno" > 102< / span >   < span class = "keywordtype" > int< / span > limit = utils::roundDown(numVecs, kUnroll);< / div >
< div class = "line" > < a name = "l00103" > < / a > < span class = "lineno" > 103< / span >   < / div >
< div class = "line" > < a name = "l00104" > < / a > < span class = "lineno" > 104< / span >   < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > i = 0; i < limit; i += kUnroll) {< / div >
< div class = "line" > < a name = "l00105" > < / a > < span class = "lineno" > 105< / span >   < span class = "keywordtype" > float< / span > vecVal[kUnroll];< / div >
< div class = "line" > < a name = "l00106" > < / a > < span class = "lineno" > 106< / span >   < / div >
< div class = "line" > < a name = "l00107" > < / a > < span class = "lineno" > 107< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00108" > < / a > < span class = "lineno" > 108< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00109" > < / a > < span class = "lineno" > 109< / span >   vecVal[j] = < a class = "code" href = "structfaiss_1_1gpu_1_1ConvertTo.html" > ConvertTo< float> ::to< / a > (vecs[(i + j) * dim + threadIdx.x]);< / div >
< div class = "line" > < a name = "l00110" > < / a > < span class = "lineno" > 110< / span >   }< / div >
< div class = "line" > < a name = "l00111" > < / a > < span class = "lineno" > 111< / span >   < / div >
< div class = "line" > < a name = "l00112" > < / a > < span class = "lineno" > 112< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00113" > < / a > < span class = "lineno" > 113< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00114" > < / a > < span class = "lineno" > 114< / span >   < span class = "keywordflow" > if< / span > (L2) {< / div >
< div class = "line" > < a name = "l00115" > < / a > < span class = "lineno" > 115< / span >   vecVal[j] = l2Distance(queryVal, vecVal[j]);< / div >
< div class = "line" > < a name = "l00116" > < / a > < span class = "lineno" > 116< / span >   } < span class = "keywordflow" > else< / span > {< / div >
< div class = "line" > < a name = "l00117" > < / a > < span class = "lineno" > 117< / span >   vecVal[j] = ipDistance(queryVal, vecVal[j]);< / div >
< div class = "line" > < a name = "l00118" > < / a > < span class = "lineno" > 118< / span >   }< / div >
< div class = "line" > < a name = "l00119" > < / a > < span class = "lineno" > 119< / span >   }< / div >
< div class = "line" > < a name = "l00120" > < / a > < span class = "lineno" > 120< / span >   < / div >
< div class = "line" > < a name = "l00121" > < / a > < span class = "lineno" > 121< / span >   blockReduceAllSum< kUnroll, float, false, true> (vecVal, smem);< / div >
< div class = "line" > < a name = "l00122" > < / a > < span class = "lineno" > 122< / span >   < / div >
< div class = "line" > < a name = "l00123" > < / a > < span class = "lineno" > 123< / span >   < span class = "keywordflow" > if< / span > (threadIdx.x == 0) {< / div >
< div class = "line" > < a name = "l00124" > < / a > < span class = "lineno" > 124< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00125" > < / a > < span class = "lineno" > 125< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00126" > < / a > < span class = "lineno" > 126< / span >   distanceOut[i + j] = vecVal[j];< / div >
< div class = "line" > < a name = "l00127" > < / a > < span class = "lineno" > 127< / span >   }< / div >
< div class = "line" > < a name = "l00128" > < / a > < span class = "lineno" > 128< / span >   }< / div >
< div class = "line" > < a name = "l00129" > < / a > < span class = "lineno" > 129< / span >   }< / div >
< div class = "line" > < a name = "l00130" > < / a > < span class = "lineno" > 130< / span >   < / div >
< div class = "line" > < a name = "l00131" > < / a > < span class = "lineno" > 131< / span >   < span class = "comment" > // Handle remainder< / span > < / div >
< div class = "line" > < a name = "l00132" > < / a > < span class = "lineno" > 132< / span >   < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > i = limit; i < numVecs; ++i) {< / div >
< div class = "line" > < a name = "l00133" > < / a > < span class = "lineno" > 133< / span >   < span class = "keywordtype" > float< / span > vecVal = < a class = "code" href = "structfaiss_1_1gpu_1_1ConvertTo.html" > ConvertTo< float> ::to< / a > (vecs[i * dim + threadIdx.x]);< / div >
< div class = "line" > < a name = "l00134" > < / a > < span class = "lineno" > 134< / span >   < / div >
< div class = "line" > < a name = "l00135" > < / a > < span class = "lineno" > 135< / span >   < span class = "keywordflow" > if< / span > (L2) {< / div >
< div class = "line" > < a name = "l00136" > < / a > < span class = "lineno" > 136< / span >   vecVal = l2Distance(queryVal, vecVal);< / div >
< div class = "line" > < a name = "l00137" > < / a > < span class = "lineno" > 137< / span >   } < span class = "keywordflow" > else< / span > {< / div >
< div class = "line" > < a name = "l00138" > < / a > < span class = "lineno" > 138< / span >   vecVal = ipDistance(queryVal, vecVal);< / div >
< div class = "line" > < a name = "l00139" > < / a > < span class = "lineno" > 139< / span >   }< / div >
< div class = "line" > < a name = "l00140" > < / a > < span class = "lineno" > 140< / span >   < / div >
< div class = "line" > < a name = "l00141" > < / a > < span class = "lineno" > 141< / span >   vecVal = blockReduceAllSum< float, false, true> (vecVal, smem);< / div >
< div class = "line" > < a name = "l00142" > < / a > < span class = "lineno" > 142< / span >   < / div >
< div class = "line" > < a name = "l00143" > < / a > < span class = "lineno" > 143< / span >   < span class = "keywordflow" > if< / span > (threadIdx.x == 0) {< / div >
< div class = "line" > < a name = "l00144" > < / a > < span class = "lineno" > 144< / span >   distanceOut[i] = vecVal;< / div >
< div class = "line" > < a name = "l00145" > < / a > < span class = "lineno" > 145< / span >   }< / div >
< div class = "line" > < a name = "l00146" > < / a > < span class = "lineno" > 146< / span >   }< / div >
< div class = "line" > < a name = "l00147" > < / a > < span class = "lineno" > 147< / span >   }< / div >
< div class = "line" > < a name = "l00148" > < / a > < span class = "lineno" > 148< / span >   };< / div >
< div class = "line" > < a name = "l00149" > < / a > < span class = "lineno" > 149< / span >   < / div >
< div class = "line" > < a name = "l00150" > < / a > < span class = "lineno" > 150< / span >   < span class = "comment" > // 64-d float32 implementation< / span > < / div >
< div class = "line" > < a name = "l00151" > < / a > < span class = "lineno" > 151< / span >   < span class = "keyword" > template< / span > < < span class = "keywordtype" > bool< / span > L2> < / div >
< div class = "line" > < a name = "l00152" > < / a > < span class = "lineno" > < a class = "line" href = "structfaiss_1_1gpu_1_1IVFFlatScan_3_0164_00_01L2_00_01float_01_4.html" > 152< / a > < / span >   < span class = "keyword" > struct < / span > < a class = "code" href = "structfaiss_1_1gpu_1_1IVFFlatScan.html" > IVFFlatScan< / a > < 64, L2, float> {< / div >
< div class = "line" > < a name = "l00153" > < / a > < span class = "lineno" > 153< / span >   < span class = "keyword" > static< / span > constexpr < span class = "keywordtype" > int< / span > kDims = 64;< / div >
< div class = "line" > < a name = "l00154" > < / a > < span class = "lineno" > 154< / span >   < / div >
< div class = "line" > < a name = "l00155" > < / a > < span class = "lineno" > 155< / span >   < span class = "keyword" > static< / span > __device__ < span class = "keywordtype" > void< / span > scan(< span class = "keywordtype" > float< / span > * query,< / div >
< div class = "line" > < a name = "l00156" > < / a > < span class = "lineno" > 156< / span >   < span class = "keywordtype" > void< / span > * vecData,< / div >
< div class = "line" > < a name = "l00157" > < / a > < span class = "lineno" > 157< / span >   < span class = "keywordtype" > int< / span > numVecs,< / div >
< div class = "line" > < a name = "l00158" > < / a > < span class = "lineno" > 158< / span >   < span class = "keywordtype" > int< / span > dim,< / div >
< div class = "line" > < a name = "l00159" > < / a > < span class = "lineno" > 159< / span >   < span class = "keywordtype" > float< / span > * distanceOut) {< / div >
< div class = "line" > < a name = "l00160" > < / a > < span class = "lineno" > 160< / span >   < span class = "comment" > // Each warp reduces a single 64-d vector; each lane loads a float2< / span > < / div >
< div class = "line" > < a name = "l00161" > < / a > < span class = "lineno" > 161< / span >   < span class = "keywordtype" > float< / span > * vecs = (< span class = "keywordtype" > float< / span > *) vecData;< / div >
< div class = "line" > < a name = "l00162" > < / a > < span class = "lineno" > 162< / span >   < / div >
< div class = "line" > < a name = "l00163" > < / a > < span class = "lineno" > 163< / span >   < span class = "keywordtype" > int< / span > laneId = getLaneId();< / div >
< div class = "line" > < a name = "l00164" > < / a > < span class = "lineno" > 164< / span >   < span class = "keywordtype" > int< / span > warpId = threadIdx.x / kWarpSize;< / div >
< div class = "line" > < a name = "l00165" > < / a > < span class = "lineno" > 165< / span >   < span class = "keywordtype" > int< / span > numWarps = blockDim.x / kWarpSize;< / div >
< div class = "line" > < a name = "l00166" > < / a > < span class = "lineno" > 166< / span >   < / div >
< div class = "line" > < a name = "l00167" > < / a > < span class = "lineno" > 167< / span >   float2 queryVal = *(float2*) & query[laneId * 2];< / div >
< div class = "line" > < a name = "l00168" > < / a > < span class = "lineno" > 168< / span >   < / div >
< div class = "line" > < a name = "l00169" > < / a > < span class = "lineno" > 169< / span >   constexpr < span class = "keywordtype" > int< / span > kUnroll = 4;< / div >
< div class = "line" > < a name = "l00170" > < / a > < span class = "lineno" > 170< / span >   float2 vecVal[kUnroll];< / div >
< div class = "line" > < a name = "l00171" > < / a > < span class = "lineno" > 171< / span >   < / div >
< div class = "line" > < a name = "l00172" > < / a > < span class = "lineno" > 172< / span >   < span class = "keywordtype" > int< / span > limit = utils::roundDown(numVecs, kUnroll * numWarps);< / div >
< div class = "line" > < a name = "l00173" > < / a > < span class = "lineno" > 173< / span >   < / div >
< div class = "line" > < a name = "l00174" > < / a > < span class = "lineno" > 174< / span >   < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > i = warpId; i < limit; i += kUnroll * numWarps) {< / div >
< div class = "line" > < a name = "l00175" > < / a > < span class = "lineno" > 175< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00176" > < / a > < span class = "lineno" > 176< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00177" > < / a > < span class = "lineno" > 177< / span >   < span class = "comment" > // Vector we are loading from is i< / span > < / div >
< div class = "line" > < a name = "l00178" > < / a > < span class = "lineno" > 178< / span >   < span class = "comment" > // Dim we are loading from is laneId * 2< / span > < / div >
< div class = "line" > < a name = "l00179" > < / a > < span class = "lineno" > 179< / span >   vecVal[j] = *(float2*) & vecs[(i + j * numWarps) * kDims + laneId * 2];< / div >
< div class = "line" > < a name = "l00180" > < / a > < span class = "lineno" > 180< / span >   }< / div >
< div class = "line" > < a name = "l00181" > < / a > < span class = "lineno" > 181< / span >   < / div >
< div class = "line" > < a name = "l00182" > < / a > < span class = "lineno" > 182< / span >   < span class = "keywordtype" > float< / span > dist[kUnroll];< / div >
< div class = "line" > < a name = "l00183" > < / a > < span class = "lineno" > 183< / span >   < / div >
< div class = "line" > < a name = "l00184" > < / a > < span class = "lineno" > 184< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00185" > < / a > < span class = "lineno" > 185< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00186" > < / a > < span class = "lineno" > 186< / span >   < span class = "keywordflow" > if< / span > (L2) {< / div >
< div class = "line" > < a name = "l00187" > < / a > < span class = "lineno" > 187< / span >   dist[j] = l2Distance(queryVal, vecVal[j]);< / div >
< div class = "line" > < a name = "l00188" > < / a > < span class = "lineno" > 188< / span >   } < span class = "keywordflow" > else< / span > {< / div >
< div class = "line" > < a name = "l00189" > < / a > < span class = "lineno" > 189< / span >   dist[j] = ipDistance(queryVal, vecVal[j]);< / div >
< div class = "line" > < a name = "l00190" > < / a > < span class = "lineno" > 190< / span >   }< / div >
< div class = "line" > < a name = "l00191" > < / a > < span class = "lineno" > 191< / span >   }< / div >
< div class = "line" > < a name = "l00192" > < / a > < span class = "lineno" > 192< / span >   < / div >
< div class = "line" > < a name = "l00193" > < / a > < span class = "lineno" > 193< / span >   < span class = "comment" > // Reduce within the warp< / span > < / div >
< div class = "line" > < a name = "l00194" > < / a > < span class = "lineno" > 194< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00195" > < / a > < span class = "lineno" > 195< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00196" > < / a > < span class = "lineno" > 196< / span >   dist[j] = warpReduceAllSum(dist[j]);< / div >
< div class = "line" > < a name = "l00197" > < / a > < span class = "lineno" > 197< / span >   }< / div >
< div class = "line" > < a name = "l00198" > < / a > < span class = "lineno" > 198< / span >   < / div >
< div class = "line" > < a name = "l00199" > < / a > < span class = "lineno" > 199< / span >   < span class = "keywordflow" > if< / span > (laneId == 0) {< / div >
< div class = "line" > < a name = "l00200" > < / a > < span class = "lineno" > 200< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00201" > < / a > < span class = "lineno" > 201< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00202" > < / a > < span class = "lineno" > 202< / span >   distanceOut[i + j * numWarps] = dist[j];< / div >
< div class = "line" > < a name = "l00203" > < / a > < span class = "lineno" > 203< / span >   }< / div >
< div class = "line" > < a name = "l00204" > < / a > < span class = "lineno" > 204< / span >   }< / div >
< div class = "line" > < a name = "l00205" > < / a > < span class = "lineno" > 205< / span >   }< / div >
< div class = "line" > < a name = "l00206" > < / a > < span class = "lineno" > 206< / span >   < / div >
< div class = "line" > < a name = "l00207" > < / a > < span class = "lineno" > 207< / span >   < span class = "comment" > // Handle remainder< / span > < / div >
< div class = "line" > < a name = "l00208" > < / a > < span class = "lineno" > 208< / span >   < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > i = limit + warpId; i < numVecs; i += numWarps) {< / div >
< div class = "line" > < a name = "l00209" > < / a > < span class = "lineno" > 209< / span >   vecVal[0] = *(float2*) & vecs[i * kDims + laneId * 2];< / div >
< div class = "line" > < a name = "l00210" > < / a > < span class = "lineno" > 210< / span >   < span class = "keywordtype" > float< / span > dist;< / div >
< div class = "line" > < a name = "l00211" > < / a > < span class = "lineno" > 211< / span >   < span class = "keywordflow" > if< / span > (L2) {< / div >
< div class = "line" > < a name = "l00212" > < / a > < span class = "lineno" > 212< / span >   dist = l2Distance(queryVal, vecVal[0]);< / div >
< div class = "line" > < a name = "l00213" > < / a > < span class = "lineno" > 213< / span >   } < span class = "keywordflow" > else< / span > {< / div >
< div class = "line" > < a name = "l00214" > < / a > < span class = "lineno" > 214< / span >   dist = ipDistance(queryVal, vecVal[0]);< / div >
< div class = "line" > < a name = "l00215" > < / a > < span class = "lineno" > 215< / span >   }< / div >
< div class = "line" > < a name = "l00216" > < / a > < span class = "lineno" > 216< / span >   < / div >
< div class = "line" > < a name = "l00217" > < / a > < span class = "lineno" > 217< / span >   dist = warpReduceAllSum(dist);< / div >
< div class = "line" > < a name = "l00218" > < / a > < span class = "lineno" > 218< / span >   < / div >
< div class = "line" > < a name = "l00219" > < / a > < span class = "lineno" > 219< / span >   < span class = "keywordflow" > if< / span > (laneId == 0) {< / div >
< div class = "line" > < a name = "l00220" > < / a > < span class = "lineno" > 220< / span >   distanceOut[i] = dist;< / div >
< div class = "line" > < a name = "l00221" > < / a > < span class = "lineno" > 221< / span >   }< / div >
< div class = "line" > < a name = "l00222" > < / a > < span class = "lineno" > 222< / span >   }< / div >
< div class = "line" > < a name = "l00223" > < / a > < span class = "lineno" > 223< / span >   }< / div >
< div class = "line" > < a name = "l00224" > < / a > < span class = "lineno" > 224< / span >   };< / div >
< div class = "line" > < a name = "l00225" > < / a > < span class = "lineno" > 225< / span >   < / div >
< div class = "line" > < a name = "l00226" > < / a > < span class = "lineno" > 226< / span >   < span class = "preprocessor" > #ifdef FAISS_USE_FLOAT16< / span > < / div >
< div class = "line" > < a name = "l00227" > < / a > < span class = "lineno" > 227< / span >   < span class = "preprocessor" > < / span > < / div >
< div class = "line" > < a name = "l00228" > < / a > < span class = "lineno" > 228< / span >   < span class = "comment" > // float16 implementation< / span > < / div >
< div class = "line" > < a name = "l00229" > < / a > < span class = "lineno" > 229< / span >   < span class = "keyword" > template< / span > < < span class = "keywordtype" > bool< / span > L2> < / div >
< div class = "line" > < a name = "l00230" > < / a > < span class = "lineno" > 230< / span >   < span class = "keyword" > struct < / span > < a class = "code" href = "structfaiss_1_1gpu_1_1IVFFlatScan.html" > IVFFlatScan< / a > < 64, L2, half> {< / div >
< div class = "line" > < a name = "l00231" > < / a > < span class = "lineno" > 231< / span >   < span class = "keyword" > static< / span > constexpr < span class = "keywordtype" > int< / span > kDims = 64;< / div >
< div class = "line" > < a name = "l00232" > < / a > < span class = "lineno" > 232< / span >   < / div >
< div class = "line" > < a name = "l00233" > < / a > < span class = "lineno" > 233< / span >   < span class = "keyword" > static< / span > __device__ < span class = "keywordtype" > void< / span > scan(< span class = "keywordtype" > float< / span > * query,< / div >
< div class = "line" > < a name = "l00234" > < / a > < span class = "lineno" > 234< / span >   < span class = "keywordtype" > void< / span > * vecData,< / div >
< div class = "line" > < a name = "l00235" > < / a > < span class = "lineno" > 235< / span >   < span class = "keywordtype" > int< / span > numVecs,< / div >
< div class = "line" > < a name = "l00236" > < / a > < span class = "lineno" > 236< / span >   < span class = "keywordtype" > int< / span > dim,< / div >
< div class = "line" > < a name = "l00237" > < / a > < span class = "lineno" > 237< / span >   < span class = "keywordtype" > float< / span > * distanceOut) {< / div >
< div class = "line" > < a name = "l00238" > < / a > < span class = "lineno" > 238< / span >   < span class = "comment" > // Each warp reduces a single 64-d vector; each lane loads a half2< / span > < / div >
< div class = "line" > < a name = "l00239" > < / a > < span class = "lineno" > 239< / span >   half* vecs = (half*) vecData;< / div >
< div class = "line" > < a name = "l00240" > < / a > < span class = "lineno" > 240< / span >   < / div >
< div class = "line" > < a name = "l00241" > < / a > < span class = "lineno" > 241< / span >   < span class = "keywordtype" > int< / span > laneId = getLaneId();< / div >
< div class = "line" > < a name = "l00242" > < / a > < span class = "lineno" > 242< / span >   < span class = "keywordtype" > int< / span > warpId = threadIdx.x / kWarpSize;< / div >
< div class = "line" > < a name = "l00243" > < / a > < span class = "lineno" > 243< / span >   < span class = "keywordtype" > int< / span > numWarps = blockDim.x / kWarpSize;< / div >
< div class = "line" > < a name = "l00244" > < / a > < span class = "lineno" > 244< / span >   < / div >
< div class = "line" > < a name = "l00245" > < / a > < span class = "lineno" > 245< / span >   float2 queryVal = *(float2*) & query[laneId * 2];< / div >
< div class = "line" > < a name = "l00246" > < / a > < span class = "lineno" > 246< / span >   < / div >
< div class = "line" > < a name = "l00247" > < / a > < span class = "lineno" > 247< / span >   constexpr < span class = "keywordtype" > int< / span > kUnroll = 4;< / div >
< div class = "line" > < a name = "l00248" > < / a > < span class = "lineno" > 248< / span >   < / div >
< div class = "line" > < a name = "l00249" > < / a > < span class = "lineno" > 249< / span >   half2 vecVal[kUnroll];< / div >
< div class = "line" > < a name = "l00250" > < / a > < span class = "lineno" > 250< / span >   < / div >
< div class = "line" > < a name = "l00251" > < / a > < span class = "lineno" > 251< / span >   < span class = "keywordtype" > int< / span > limit = utils::roundDown(numVecs, kUnroll * numWarps);< / div >
< div class = "line" > < a name = "l00252" > < / a > < span class = "lineno" > 252< / span >   < / div >
< div class = "line" > < a name = "l00253" > < / a > < span class = "lineno" > 253< / span >   < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > i = warpId; i < limit; i += kUnroll * numWarps) {< / div >
< div class = "line" > < a name = "l00254" > < / a > < span class = "lineno" > 254< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00255" > < / a > < span class = "lineno" > 255< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00256" > < / a > < span class = "lineno" > 256< / span >   < span class = "comment" > // Vector we are loading from is i< / span > < / div >
< div class = "line" > < a name = "l00257" > < / a > < span class = "lineno" > 257< / span >   < span class = "comment" > // Dim we are loading from is laneId * 2< / span > < / div >
< div class = "line" > < a name = "l00258" > < / a > < span class = "lineno" > 258< / span >   vecVal[j] = *(half2*) & vecs[(i + j * numWarps) * kDims + laneId * 2];< / div >
< div class = "line" > < a name = "l00259" > < / a > < span class = "lineno" > 259< / span >   }< / div >
< div class = "line" > < a name = "l00260" > < / a > < span class = "lineno" > 260< / span >   < / div >
< div class = "line" > < a name = "l00261" > < / a > < span class = "lineno" > 261< / span >   < span class = "keywordtype" > float< / span > dist[kUnroll];< / div >
< div class = "line" > < a name = "l00262" > < / a > < span class = "lineno" > 262< / span >   < / div >
< div class = "line" > < a name = "l00263" > < / a > < span class = "lineno" > 263< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00264" > < / a > < span class = "lineno" > 264< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00265" > < / a > < span class = "lineno" > 265< / span >   < span class = "keywordflow" > if< / span > (L2) {< / div >
< div class = "line" > < a name = "l00266" > < / a > < span class = "lineno" > 266< / span >   dist[j] = l2Distance(queryVal, __half22float2(vecVal[j]));< / div >
< div class = "line" > < a name = "l00267" > < / a > < span class = "lineno" > 267< / span >   } < span class = "keywordflow" > else< / span > {< / div >
< div class = "line" > < a name = "l00268" > < / a > < span class = "lineno" > 268< / span >   dist[j] = ipDistance(queryVal, __half22float2(vecVal[j]));< / div >
< div class = "line" > < a name = "l00269" > < / a > < span class = "lineno" > 269< / span >   }< / div >
< div class = "line" > < a name = "l00270" > < / a > < span class = "lineno" > 270< / span >   }< / div >
< div class = "line" > < a name = "l00271" > < / a > < span class = "lineno" > 271< / span >   < / div >
< div class = "line" > < a name = "l00272" > < / a > < span class = "lineno" > 272< / span >   < span class = "comment" > // Reduce within the warp< / span > < / div >
< div class = "line" > < a name = "l00273" > < / a > < span class = "lineno" > 273< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00274" > < / a > < span class = "lineno" > 274< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00275" > < / a > < span class = "lineno" > 275< / span >   dist[j] = warpReduceAllSum(dist[j]);< / div >
< div class = "line" > < a name = "l00276" > < / a > < span class = "lineno" > 276< / span >   }< / div >
< div class = "line" > < a name = "l00277" > < / a > < span class = "lineno" > 277< / span >   < / div >
< div class = "line" > < a name = "l00278" > < / a > < span class = "lineno" > 278< / span >   < span class = "keywordflow" > if< / span > (laneId == 0) {< / div >
< div class = "line" > < a name = "l00279" > < / a > < span class = "lineno" > 279< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00280" > < / a > < span class = "lineno" > 280< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00281" > < / a > < span class = "lineno" > 281< / span >   distanceOut[i + j * numWarps] = dist[j];< / div >
< div class = "line" > < a name = "l00282" > < / a > < span class = "lineno" > 282< / span >   }< / div >
< div class = "line" > < a name = "l00283" > < / a > < span class = "lineno" > 283< / span >   }< / div >
< div class = "line" > < a name = "l00284" > < / a > < span class = "lineno" > 284< / span >   }< / div >
< div class = "line" > < a name = "l00285" > < / a > < span class = "lineno" > 285< / span >   < / div >
< div class = "line" > < a name = "l00286" > < / a > < span class = "lineno" > 286< / span >   < span class = "comment" > // Handle remainder< / span > < / div >
< div class = "line" > < a name = "l00287" > < / a > < span class = "lineno" > 287< / span >   < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > i = limit + warpId; i < numVecs; i += numWarps) {< / div >
< div class = "line" > < a name = "l00288" > < / a > < span class = "lineno" > 288< / span >   vecVal[0] = *(half2*) & vecs[i * kDims + laneId * 2];< / div >
< div class = "line" > < a name = "l00289" > < / a > < span class = "lineno" > 289< / span >   < / div >
< div class = "line" > < a name = "l00290" > < / a > < span class = "lineno" > 290< / span >   < span class = "keywordtype" > float< / span > dist;< / div >
< div class = "line" > < a name = "l00291" > < / a > < span class = "lineno" > 291< / span >   < span class = "keywordflow" > if< / span > (L2) {< / div >
< div class = "line" > < a name = "l00292" > < / a > < span class = "lineno" > 292< / span >   dist = l2Distance(queryVal, __half22float2(vecVal[0]));< / div >
< div class = "line" > < a name = "l00293" > < / a > < span class = "lineno" > 293< / span >   } < span class = "keywordflow" > else< / span > {< / div >
< div class = "line" > < a name = "l00294" > < / a > < span class = "lineno" > 294< / span >   dist = ipDistance(queryVal, __half22float2(vecVal[0]));< / div >
< div class = "line" > < a name = "l00295" > < / a > < span class = "lineno" > 295< / span >   }< / div >
< div class = "line" > < a name = "l00296" > < / a > < span class = "lineno" > 296< / span >   < / div >
< div class = "line" > < a name = "l00297" > < / a > < span class = "lineno" > 297< / span >   dist = warpReduceAllSum(dist);< / div >
< div class = "line" > < a name = "l00298" > < / a > < span class = "lineno" > 298< / span >   < / div >
< div class = "line" > < a name = "l00299" > < / a > < span class = "lineno" > 299< / span >   < span class = "keywordflow" > if< / span > (laneId == 0) {< / div >
< div class = "line" > < a name = "l00300" > < / a > < span class = "lineno" > 300< / span >   distanceOut[i] = dist;< / div >
< div class = "line" > < a name = "l00301" > < / a > < span class = "lineno" > 301< / span >   }< / div >
< div class = "line" > < a name = "l00302" > < / a > < span class = "lineno" > 302< / span >   }< / div >
< div class = "line" > < a name = "l00303" > < / a > < span class = "lineno" > 303< / span >   }< / div >
< div class = "line" > < a name = "l00304" > < / a > < span class = "lineno" > 304< / span >   };< / div >
< div class = "line" > < a name = "l00305" > < / a > < span class = "lineno" > 305< / span >   < / div >
< div class = "line" > < a name = "l00306" > < / a > < span class = "lineno" > 306< / span >   < span class = "preprocessor" > #endif< / span > < / div >
< div class = "line" > < a name = "l00307" > < / a > < span class = "lineno" > 307< / span >   < span class = "preprocessor" > < / span > < / div >
< div class = "line" > < a name = "l00308" > < / a > < span class = "lineno" > 308< / span >   < span class = "comment" > // 128-d float32 implementation< / span > < / div >
< div class = "line" > < a name = "l00309" > < / a > < span class = "lineno" > 309< / span >   < span class = "keyword" > template< / span > < < span class = "keywordtype" > bool< / span > L2> < / div >
< div class = "line" > < a name = "l00310" > < / a > < span class = "lineno" > < a class = "line" href = "structfaiss_1_1gpu_1_1IVFFlatScan_3_01128_00_01L2_00_01float_01_4.html" > 310< / a > < / span >   < span class = "keyword" > struct < / span > < a class = "code" href = "structfaiss_1_1gpu_1_1IVFFlatScan.html" > IVFFlatScan< / a > < 128, L2, float> {< / div >
< div class = "line" > < a name = "l00311" > < / a > < span class = "lineno" > 311< / span >   < span class = "keyword" > static< / span > constexpr < span class = "keywordtype" > int< / span > kDims = 128;< / div >
< div class = "line" > < a name = "l00312" > < / a > < span class = "lineno" > 312< / span >   < / div >
< div class = "line" > < a name = "l00313" > < / a > < span class = "lineno" > 313< / span >   < span class = "keyword" > static< / span > __device__ < span class = "keywordtype" > void< / span > scan(< span class = "keywordtype" > float< / span > * query,< / div >
< div class = "line" > < a name = "l00314" > < / a > < span class = "lineno" > 314< / span >   < span class = "keywordtype" > void< / span > * vecData,< / div >
< div class = "line" > < a name = "l00315" > < / a > < span class = "lineno" > 315< / span >   < span class = "keywordtype" > int< / span > numVecs,< / div >
< div class = "line" > < a name = "l00316" > < / a > < span class = "lineno" > 316< / span >   < span class = "keywordtype" > int< / span > dim,< / div >
< div class = "line" > < a name = "l00317" > < / a > < span class = "lineno" > 317< / span >   < span class = "keywordtype" > float< / span > * distanceOut) {< / div >
< div class = "line" > < a name = "l00318" > < / a > < span class = "lineno" > 318< / span >   < span class = "comment" > // Each warp reduces a single 128-d vector; each lane loads a float4< / span > < / div >
< div class = "line" > < a name = "l00319" > < / a > < span class = "lineno" > 319< / span >   < span class = "keywordtype" > float< / span > * vecs = (< span class = "keywordtype" > float< / span > *) vecData;< / div >
< div class = "line" > < a name = "l00320" > < / a > < span class = "lineno" > 320< / span >   < / div >
< div class = "line" > < a name = "l00321" > < / a > < span class = "lineno" > 321< / span >   < span class = "keywordtype" > int< / span > laneId = getLaneId();< / div >
< div class = "line" > < a name = "l00322" > < / a > < span class = "lineno" > 322< / span >   < span class = "keywordtype" > int< / span > warpId = threadIdx.x / kWarpSize;< / div >
< div class = "line" > < a name = "l00323" > < / a > < span class = "lineno" > 323< / span >   < span class = "keywordtype" > int< / span > numWarps = blockDim.x / kWarpSize;< / div >
< div class = "line" > < a name = "l00324" > < / a > < span class = "lineno" > 324< / span >   < / div >
< div class = "line" > < a name = "l00325" > < / a > < span class = "lineno" > 325< / span >   float4 queryVal = *(float4*) & query[laneId * 4];< / div >
< div class = "line" > < a name = "l00326" > < / a > < span class = "lineno" > 326< / span >   < / div >
< div class = "line" > < a name = "l00327" > < / a > < span class = "lineno" > 327< / span >   constexpr < span class = "keywordtype" > int< / span > kUnroll = 4;< / div >
< div class = "line" > < a name = "l00328" > < / a > < span class = "lineno" > 328< / span >   float4 vecVal[kUnroll];< / div >
< div class = "line" > < a name = "l00329" > < / a > < span class = "lineno" > 329< / span >   < / div >
< div class = "line" > < a name = "l00330" > < / a > < span class = "lineno" > 330< / span >   < span class = "keywordtype" > int< / span > limit = utils::roundDown(numVecs, kUnroll * numWarps);< / div >
< div class = "line" > < a name = "l00331" > < / a > < span class = "lineno" > 331< / span >   < / div >
< div class = "line" > < a name = "l00332" > < / a > < span class = "lineno" > 332< / span >   < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > i = warpId; i < limit; i += kUnroll * numWarps) {< / div >
< div class = "line" > < a name = "l00333" > < / a > < span class = "lineno" > 333< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00334" > < / a > < span class = "lineno" > 334< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00335" > < / a > < span class = "lineno" > 335< / span >   < span class = "comment" > // Vector we are loading from is i< / span > < / div >
< div class = "line" > < a name = "l00336" > < / a > < span class = "lineno" > 336< / span >   < span class = "comment" > // Dim we are loading from is laneId * 4< / span > < / div >
< div class = "line" > < a name = "l00337" > < / a > < span class = "lineno" > 337< / span >   vecVal[j] = *(float4*) & vecs[(i + j * numWarps) * kDims + laneId * 4];< / div >
< div class = "line" > < a name = "l00338" > < / a > < span class = "lineno" > 338< / span >   }< / div >
< div class = "line" > < a name = "l00339" > < / a > < span class = "lineno" > 339< / span >   < / div >
< div class = "line" > < a name = "l00340" > < / a > < span class = "lineno" > 340< / span >   < span class = "keywordtype" > float< / span > dist[kUnroll];< / div >
< div class = "line" > < a name = "l00341" > < / a > < span class = "lineno" > 341< / span >   < / div >
< div class = "line" > < a name = "l00342" > < / a > < span class = "lineno" > 342< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00343" > < / a > < span class = "lineno" > 343< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00344" > < / a > < span class = "lineno" > 344< / span >   < span class = "keywordflow" > if< / span > (L2) {< / div >
< div class = "line" > < a name = "l00345" > < / a > < span class = "lineno" > 345< / span >   dist[j] = l2Distance(queryVal, vecVal[j]);< / div >
< div class = "line" > < a name = "l00346" > < / a > < span class = "lineno" > 346< / span >   } < span class = "keywordflow" > else< / span > {< / div >
< div class = "line" > < a name = "l00347" > < / a > < span class = "lineno" > 347< / span >   dist[j] = ipDistance(queryVal, vecVal[j]);< / div >
< div class = "line" > < a name = "l00348" > < / a > < span class = "lineno" > 348< / span >   }< / div >
< div class = "line" > < a name = "l00349" > < / a > < span class = "lineno" > 349< / span >   }< / div >
< div class = "line" > < a name = "l00350" > < / a > < span class = "lineno" > 350< / span >   < / div >
< div class = "line" > < a name = "l00351" > < / a > < span class = "lineno" > 351< / span >   < span class = "comment" > // Reduce within the warp< / span > < / div >
< div class = "line" > < a name = "l00352" > < / a > < span class = "lineno" > 352< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00353" > < / a > < span class = "lineno" > 353< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00354" > < / a > < span class = "lineno" > 354< / span >   dist[j] = warpReduceAllSum(dist[j]);< / div >
< div class = "line" > < a name = "l00355" > < / a > < span class = "lineno" > 355< / span >   }< / div >
< div class = "line" > < a name = "l00356" > < / a > < span class = "lineno" > 356< / span >   < / div >
< div class = "line" > < a name = "l00357" > < / a > < span class = "lineno" > 357< / span >   < span class = "keywordflow" > if< / span > (laneId == 0) {< / div >
< div class = "line" > < a name = "l00358" > < / a > < span class = "lineno" > 358< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00359" > < / a > < span class = "lineno" > 359< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00360" > < / a > < span class = "lineno" > 360< / span >   distanceOut[i + j * numWarps] = dist[j];< / div >
< div class = "line" > < a name = "l00361" > < / a > < span class = "lineno" > 361< / span >   }< / div >
< div class = "line" > < a name = "l00362" > < / a > < span class = "lineno" > 362< / span >   }< / div >
< div class = "line" > < a name = "l00363" > < / a > < span class = "lineno" > 363< / span >   }< / div >
< div class = "line" > < a name = "l00364" > < / a > < span class = "lineno" > 364< / span >   < / div >
< div class = "line" > < a name = "l00365" > < / a > < span class = "lineno" > 365< / span >   < span class = "comment" > // Handle remainder< / span > < / div >
< div class = "line" > < a name = "l00366" > < / a > < span class = "lineno" > 366< / span >   < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > i = limit + warpId; i < numVecs; i += numWarps) {< / div >
< div class = "line" > < a name = "l00367" > < / a > < span class = "lineno" > 367< / span >   vecVal[0] = *(float4*) & vecs[i * kDims + laneId * 4];< / div >
< div class = "line" > < a name = "l00368" > < / a > < span class = "lineno" > 368< / span >   < span class = "keywordtype" > float< / span > dist;< / div >
< div class = "line" > < a name = "l00369" > < / a > < span class = "lineno" > 369< / span >   < span class = "keywordflow" > if< / span > (L2) {< / div >
< div class = "line" > < a name = "l00370" > < / a > < span class = "lineno" > 370< / span >   dist = l2Distance(queryVal, vecVal[0]);< / div >
< div class = "line" > < a name = "l00371" > < / a > < span class = "lineno" > 371< / span >   } < span class = "keywordflow" > else< / span > {< / div >
< div class = "line" > < a name = "l00372" > < / a > < span class = "lineno" > 372< / span >   dist = ipDistance(queryVal, vecVal[0]);< / div >
< div class = "line" > < a name = "l00373" > < / a > < span class = "lineno" > 373< / span >   }< / div >
< div class = "line" > < a name = "l00374" > < / a > < span class = "lineno" > 374< / span >   < / div >
< div class = "line" > < a name = "l00375" > < / a > < span class = "lineno" > 375< / span >   dist = warpReduceAllSum(dist);< / div >
< div class = "line" > < a name = "l00376" > < / a > < span class = "lineno" > 376< / span >   < / div >
< div class = "line" > < a name = "l00377" > < / a > < span class = "lineno" > 377< / span >   < span class = "keywordflow" > if< / span > (laneId == 0) {< / div >
< div class = "line" > < a name = "l00378" > < / a > < span class = "lineno" > 378< / span >   distanceOut[i] = dist;< / div >
< div class = "line" > < a name = "l00379" > < / a > < span class = "lineno" > 379< / span >   }< / div >
< div class = "line" > < a name = "l00380" > < / a > < span class = "lineno" > 380< / span >   }< / div >
< div class = "line" > < a name = "l00381" > < / a > < span class = "lineno" > 381< / span >   }< / div >
< div class = "line" > < a name = "l00382" > < / a > < span class = "lineno" > 382< / span >   };< / div >
< div class = "line" > < a name = "l00383" > < / a > < span class = "lineno" > 383< / span >   < / div >
< div class = "line" > < a name = "l00384" > < / a > < span class = "lineno" > 384< / span >   < span class = "preprocessor" > #ifdef FAISS_USE_FLOAT16< / span > < / div >
< div class = "line" > < a name = "l00385" > < / a > < span class = "lineno" > 385< / span >   < span class = "preprocessor" > < / span > < / div >
< div class = "line" > < a name = "l00386" > < / a > < span class = "lineno" > 386< / span >   < span class = "comment" > // float16 implementation< / span > < / div >
< div class = "line" > < a name = "l00387" > < / a > < span class = "lineno" > 387< / span >   < span class = "keyword" > template< / span > < < span class = "keywordtype" > bool< / span > L2> < / div >
< div class = "line" > < a name = "l00388" > < / a > < span class = "lineno" > 388< / span >   < span class = "keyword" > struct < / span > < a class = "code" href = "structfaiss_1_1gpu_1_1IVFFlatScan.html" > IVFFlatScan< / a > < 128, L2, half> {< / div >
< div class = "line" > < a name = "l00389" > < / a > < span class = "lineno" > 389< / span >   < span class = "keyword" > static< / span > constexpr < span class = "keywordtype" > int< / span > kDims = 128;< / div >
< div class = "line" > < a name = "l00390" > < / a > < span class = "lineno" > 390< / span >   < / div >
< div class = "line" > < a name = "l00391" > < / a > < span class = "lineno" > 391< / span >   < span class = "keyword" > static< / span > __device__ < span class = "keywordtype" > void< / span > scan(< span class = "keywordtype" > float< / span > * query,< / div >
< div class = "line" > < a name = "l00392" > < / a > < span class = "lineno" > 392< / span >   < span class = "keywordtype" > void< / span > * vecData,< / div >
< div class = "line" > < a name = "l00393" > < / a > < span class = "lineno" > 393< / span >   < span class = "keywordtype" > int< / span > numVecs,< / div >
< div class = "line" > < a name = "l00394" > < / a > < span class = "lineno" > 394< / span >   < span class = "keywordtype" > int< / span > dim,< / div >
< div class = "line" > < a name = "l00395" > < / a > < span class = "lineno" > 395< / span >   < span class = "keywordtype" > float< / span > * distanceOut) {< / div >
< div class = "line" > < a name = "l00396" > < / a > < span class = "lineno" > 396< / span >   < span class = "comment" > // Each warp reduces a single 128-d vector; each lane loads a Half4< / span > < / div >
< div class = "line" > < a name = "l00397" > < / a > < span class = "lineno" > 397< / span >   half* vecs = (half*) vecData;< / div >
< div class = "line" > < a name = "l00398" > < / a > < span class = "lineno" > 398< / span >   < / div >
< div class = "line" > < a name = "l00399" > < / a > < span class = "lineno" > 399< / span >   < span class = "keywordtype" > int< / span > laneId = getLaneId();< / div >
< div class = "line" > < a name = "l00400" > < / a > < span class = "lineno" > 400< / span >   < span class = "keywordtype" > int< / span > warpId = threadIdx.x / kWarpSize;< / div >
< div class = "line" > < a name = "l00401" > < / a > < span class = "lineno" > 401< / span >   < span class = "keywordtype" > int< / span > numWarps = blockDim.x / kWarpSize;< / div >
< div class = "line" > < a name = "l00402" > < / a > < span class = "lineno" > 402< / span >   < / div >
< div class = "line" > < a name = "l00403" > < / a > < span class = "lineno" > 403< / span >   float4 queryVal = *(float4*) & query[laneId * 4];< / div >
< div class = "line" > < a name = "l00404" > < / a > < span class = "lineno" > 404< / span >   < / div >
< div class = "line" > < a name = "l00405" > < / a > < span class = "lineno" > 405< / span >   constexpr < span class = "keywordtype" > int< / span > kUnroll = 4;< / div >
< div class = "line" > < a name = "l00406" > < / a > < span class = "lineno" > 406< / span >   < / div >
< div class = "line" > < a name = "l00407" > < / a > < span class = "lineno" > 407< / span >   Half4 vecVal[kUnroll];< / div >
< div class = "line" > < a name = "l00408" > < / a > < span class = "lineno" > 408< / span >   < / div >
< div class = "line" > < a name = "l00409" > < / a > < span class = "lineno" > 409< / span >   < span class = "keywordtype" > int< / span > limit = utils::roundDown(numVecs, kUnroll * numWarps);< / div >
< div class = "line" > < a name = "l00410" > < / a > < span class = "lineno" > 410< / span >   < / div >
< div class = "line" > < a name = "l00411" > < / a > < span class = "lineno" > 411< / span >   < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > i = warpId; i < limit; i += kUnroll * numWarps) {< / div >
< div class = "line" > < a name = "l00412" > < / a > < span class = "lineno" > 412< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00413" > < / a > < span class = "lineno" > 413< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00414" > < / a > < span class = "lineno" > 414< / span >   < span class = "comment" > // Vector we are loading from is i< / span > < / div >
< div class = "line" > < a name = "l00415" > < / a > < span class = "lineno" > 415< / span >   < span class = "comment" > // Dim we are loading from is laneId * 4< / span > < / div >
< div class = "line" > < a name = "l00416" > < / a > < span class = "lineno" > 416< / span >   vecVal[j] =< / div >
< div class = "line" > < a name = "l00417" > < / a > < span class = "lineno" > 417< / span >   < a class = "code" href = "structfaiss_1_1gpu_1_1LoadStore.html" > LoadStore< Half4> ::load< / a > (< / div >
< div class = "line" > < a name = "l00418" > < / a > < span class = "lineno" > 418< / span >   & vecs[(i + j * numWarps) * kDims + laneId * 4]);< / div >
< div class = "line" > < a name = "l00419" > < / a > < span class = "lineno" > 419< / span >   }< / div >
< div class = "line" > < a name = "l00420" > < / a > < span class = "lineno" > 420< / span >   < / div >
< div class = "line" > < a name = "l00421" > < / a > < span class = "lineno" > 421< / span >   < span class = "keywordtype" > float< / span > dist[kUnroll];< / div >
< div class = "line" > < a name = "l00422" > < / a > < span class = "lineno" > 422< / span >   < / div >
< div class = "line" > < a name = "l00423" > < / a > < span class = "lineno" > 423< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00424" > < / a > < span class = "lineno" > 424< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00425" > < / a > < span class = "lineno" > 425< / span >   < span class = "keywordflow" > if< / span > (L2) {< / div >
< div class = "line" > < a name = "l00426" > < / a > < span class = "lineno" > 426< / span >   dist[j] = l2Distance(queryVal, half4ToFloat4(vecVal[j]));< / div >
< div class = "line" > < a name = "l00427" > < / a > < span class = "lineno" > 427< / span >   } < span class = "keywordflow" > else< / span > {< / div >
< div class = "line" > < a name = "l00428" > < / a > < span class = "lineno" > 428< / span >   dist[j] = ipDistance(queryVal, half4ToFloat4(vecVal[j]));< / div >
< div class = "line" > < a name = "l00429" > < / a > < span class = "lineno" > 429< / span >   }< / div >
< div class = "line" > < a name = "l00430" > < / a > < span class = "lineno" > 430< / span >   }< / div >
< div class = "line" > < a name = "l00431" > < / a > < span class = "lineno" > 431< / span >   < / div >
< div class = "line" > < a name = "l00432" > < / a > < span class = "lineno" > 432< / span >   < span class = "comment" > // Reduce within the warp< / span > < / div >
< div class = "line" > < a name = "l00433" > < / a > < span class = "lineno" > 433< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00434" > < / a > < span class = "lineno" > 434< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00435" > < / a > < span class = "lineno" > 435< / span >   dist[j] = warpReduceAllSum(dist[j]);< / div >
< div class = "line" > < a name = "l00436" > < / a > < span class = "lineno" > 436< / span >   }< / div >
< div class = "line" > < a name = "l00437" > < / a > < span class = "lineno" > 437< / span >   < / div >
< div class = "line" > < a name = "l00438" > < / a > < span class = "lineno" > 438< / span >   < span class = "keywordflow" > if< / span > (laneId == 0) {< / div >
< div class = "line" > < a name = "l00439" > < / a > < span class = "lineno" > 439< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00440" > < / a > < span class = "lineno" > 440< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00441" > < / a > < span class = "lineno" > 441< / span >   distanceOut[i + j * numWarps] = dist[j];< / div >
< div class = "line" > < a name = "l00442" > < / a > < span class = "lineno" > 442< / span >   }< / div >
< div class = "line" > < a name = "l00443" > < / a > < span class = "lineno" > 443< / span >   }< / div >
< div class = "line" > < a name = "l00444" > < / a > < span class = "lineno" > 444< / span >   }< / div >
< div class = "line" > < a name = "l00445" > < / a > < span class = "lineno" > 445< / span >   < / div >
< div class = "line" > < a name = "l00446" > < / a > < span class = "lineno" > 446< / span >   < span class = "comment" > // Handle remainder< / span > < / div >
< div class = "line" > < a name = "l00447" > < / a > < span class = "lineno" > 447< / span >   < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > i = limit + warpId; i < numVecs; i += numWarps) {< / div >
< div class = "line" > < a name = "l00448" > < / a > < span class = "lineno" > 448< / span >   vecVal[0] = LoadStore< Half4> ::load(& vecs[i * kDims + laneId * 4]);< / div >
< div class = "line" > < a name = "l00449" > < / a > < span class = "lineno" > 449< / span >   < / div >
< div class = "line" > < a name = "l00450" > < / a > < span class = "lineno" > 450< / span >   < span class = "keywordtype" > float< / span > dist;< / div >
< div class = "line" > < a name = "l00451" > < / a > < span class = "lineno" > 451< / span >   < span class = "keywordflow" > if< / span > (L2) {< / div >
< div class = "line" > < a name = "l00452" > < / a > < span class = "lineno" > 452< / span >   dist = l2Distance(queryVal, half4ToFloat4(vecVal[0]));< / div >
< div class = "line" > < a name = "l00453" > < / a > < span class = "lineno" > 453< / span >   } < span class = "keywordflow" > else< / span > {< / div >
< div class = "line" > < a name = "l00454" > < / a > < span class = "lineno" > 454< / span >   dist = ipDistance(queryVal, half4ToFloat4(vecVal[0]));< / div >
< div class = "line" > < a name = "l00455" > < / a > < span class = "lineno" > 455< / span >   }< / div >
< div class = "line" > < a name = "l00456" > < / a > < span class = "lineno" > 456< / span >   < / div >
< div class = "line" > < a name = "l00457" > < / a > < span class = "lineno" > 457< / span >   dist = warpReduceAllSum(dist);< / div >
< div class = "line" > < a name = "l00458" > < / a > < span class = "lineno" > 458< / span >   < / div >
< div class = "line" > < a name = "l00459" > < / a > < span class = "lineno" > 459< / span >   < span class = "keywordflow" > if< / span > (laneId == 0) {< / div >
< div class = "line" > < a name = "l00460" > < / a > < span class = "lineno" > 460< / span >   distanceOut[i] = dist;< / div >
< div class = "line" > < a name = "l00461" > < / a > < span class = "lineno" > 461< / span >   }< / div >
< div class = "line" > < a name = "l00462" > < / a > < span class = "lineno" > 462< / span >   }< / div >
< div class = "line" > < a name = "l00463" > < / a > < span class = "lineno" > 463< / span >   }< / div >
< div class = "line" > < a name = "l00464" > < / a > < span class = "lineno" > 464< / span >   };< / div >
< div class = "line" > < a name = "l00465" > < / a > < span class = "lineno" > 465< / span >   < / div >
< div class = "line" > < a name = "l00466" > < / a > < span class = "lineno" > 466< / span >   < span class = "preprocessor" > #endif< / span > < / div >
< div class = "line" > < a name = "l00467" > < / a > < span class = "lineno" > 467< / span >   < span class = "preprocessor" > < / span > < / div >
< div class = "line" > < a name = "l00468" > < / a > < span class = "lineno" > 468< / span >   < span class = "comment" > // 256-d float32 implementation< / span > < / div >
< div class = "line" > < a name = "l00469" > < / a > < span class = "lineno" > 469< / span >   < span class = "keyword" > template< / span > < < span class = "keywordtype" > bool< / span > L2> < / div >
< div class = "line" > < a name = "l00470" > < / a > < span class = "lineno" > < a class = "line" href = "structfaiss_1_1gpu_1_1IVFFlatScan_3_01256_00_01L2_00_01float_01_4.html" > 470< / a > < / span >   < span class = "keyword" > struct < / span > < a class = "code" href = "structfaiss_1_1gpu_1_1IVFFlatScan.html" > IVFFlatScan< / a > < 256, L2, float> {< / div >
< div class = "line" > < a name = "l00471" > < / a > < span class = "lineno" > 471< / span >   < span class = "keyword" > static< / span > constexpr < span class = "keywordtype" > int< / span > kDims = 256;< / div >
< div class = "line" > < a name = "l00472" > < / a > < span class = "lineno" > 472< / span >   < / div >
< div class = "line" > < a name = "l00473" > < / a > < span class = "lineno" > 473< / span >   < span class = "keyword" > static< / span > __device__ < span class = "keywordtype" > void< / span > scan(< span class = "keywordtype" > float< / span > * query,< / div >
< div class = "line" > < a name = "l00474" > < / a > < span class = "lineno" > 474< / span >   < span class = "keywordtype" > void< / span > * vecData,< / div >
< div class = "line" > < a name = "l00475" > < / a > < span class = "lineno" > 475< / span >   < span class = "keywordtype" > int< / span > numVecs,< / div >
< div class = "line" > < a name = "l00476" > < / a > < span class = "lineno" > 476< / span >   < span class = "keywordtype" > int< / span > dim,< / div >
< div class = "line" > < a name = "l00477" > < / a > < span class = "lineno" > 477< / span >   < span class = "keywordtype" > float< / span > * distanceOut) {< / div >
< div class = "line" > < a name = "l00478" > < / a > < span class = "lineno" > 478< / span >   < span class = "comment" > // A specialization here to load per-warp seems to be worse, since< / span > < / div >
< div class = "line" > < a name = "l00479" > < / a > < span class = "lineno" > 479< / span >   < span class = "comment" > // we' re already running at near memory b/w peak< / span > < / div >
< div class = "line" > < a name = "l00480" > < / a > < span class = "lineno" > 480< / span >   < a class = "code" href = "structfaiss_1_1gpu_1_1IVFFlatScan.html" > IVFFlatScan< 0, L2, float> ::scan< / a > (query,< / div >
< div class = "line" > < a name = "l00481" > < / a > < span class = "lineno" > 481< / span >   vecData,< / div >
< div class = "line" > < a name = "l00482" > < / a > < span class = "lineno" > 482< / span >   numVecs,< / div >
< div class = "line" > < a name = "l00483" > < / a > < span class = "lineno" > 483< / span >   dim,< / div >
< div class = "line" > < a name = "l00484" > < / a > < span class = "lineno" > 484< / span >   distanceOut);< / div >
< div class = "line" > < a name = "l00485" > < / a > < span class = "lineno" > 485< / span >   }< / div >
< div class = "line" > < a name = "l00486" > < / a > < span class = "lineno" > 486< / span >   };< / div >
< div class = "line" > < a name = "l00487" > < / a > < span class = "lineno" > 487< / span >   < / div >
< div class = "line" > < a name = "l00488" > < / a > < span class = "lineno" > 488< / span >   < span class = "preprocessor" > #ifdef FAISS_USE_FLOAT16< / span > < / div >
< div class = "line" > < a name = "l00489" > < / a > < span class = "lineno" > 489< / span >   < span class = "preprocessor" > < / span > < / div >
< div class = "line" > < a name = "l00490" > < / a > < span class = "lineno" > 490< / span >   < span class = "comment" > // float16 implementation< / span > < / div >
< div class = "line" > < a name = "l00491" > < / a > < span class = "lineno" > 491< / span >   < span class = "keyword" > template< / span > < < span class = "keywordtype" > bool< / span > L2> < / div >
< div class = "line" > < a name = "l00492" > < / a > < span class = "lineno" > 492< / span >   < span class = "keyword" > struct < / span > < a class = "code" href = "structfaiss_1_1gpu_1_1IVFFlatScan.html" > IVFFlatScan< / a > < 256, L2, half> {< / div >
< div class = "line" > < a name = "l00493" > < / a > < span class = "lineno" > 493< / span >   < span class = "keyword" > static< / span > constexpr < span class = "keywordtype" > int< / span > kDims = 256;< / div >
< div class = "line" > < a name = "l00494" > < / a > < span class = "lineno" > 494< / span >   < / div >
< div class = "line" > < a name = "l00495" > < / a > < span class = "lineno" > 495< / span >   < span class = "keyword" > static< / span > __device__ < span class = "keywordtype" > void< / span > scan(< span class = "keywordtype" > float< / span > * query,< / div >
< div class = "line" > < a name = "l00496" > < / a > < span class = "lineno" > 496< / span >   < span class = "keywordtype" > void< / span > * vecData,< / div >
< div class = "line" > < a name = "l00497" > < / a > < span class = "lineno" > 497< / span >   < span class = "keywordtype" > int< / span > numVecs,< / div >
< div class = "line" > < a name = "l00498" > < / a > < span class = "lineno" > 498< / span >   < span class = "keywordtype" > int< / span > dim,< / div >
< div class = "line" > < a name = "l00499" > < / a > < span class = "lineno" > 499< / span >   < span class = "keywordtype" > float< / span > * distanceOut) {< / div >
< div class = "line" > < a name = "l00500" > < / a > < span class = "lineno" > 500< / span >   < span class = "comment" > // Each warp reduces a single 256-d vector; each lane loads a Half8< / span > < / div >
< div class = "line" > < a name = "l00501" > < / a > < span class = "lineno" > 501< / span >   half* vecs = (half*) vecData;< / div >
< div class = "line" > < a name = "l00502" > < / a > < span class = "lineno" > 502< / span >   < / div >
< div class = "line" > < a name = "l00503" > < / a > < span class = "lineno" > 503< / span >   < span class = "keywordtype" > int< / span > laneId = getLaneId();< / div >
< div class = "line" > < a name = "l00504" > < / a > < span class = "lineno" > 504< / span >   < span class = "keywordtype" > int< / span > warpId = threadIdx.x / kWarpSize;< / div >
< div class = "line" > < a name = "l00505" > < / a > < span class = "lineno" > 505< / span >   < span class = "keywordtype" > int< / span > numWarps = blockDim.x / kWarpSize;< / div >
< div class = "line" > < a name = "l00506" > < / a > < span class = "lineno" > 506< / span >   < / div >
< div class = "line" > < a name = "l00507" > < / a > < span class = "lineno" > 507< / span >   < span class = "comment" > // This is not a contiguous load, but we only have to load these two< / span > < / div >
< div class = "line" > < a name = "l00508" > < / a > < span class = "lineno" > 508< / span >   < span class = "comment" > // values, so that we can load by Half8 below< / span > < / div >
< div class = "line" > < a name = "l00509" > < / a > < span class = "lineno" > 509< / span >   float4 queryValA = *(float4*) & query[laneId * 8];< / div >
< div class = "line" > < a name = "l00510" > < / a > < span class = "lineno" > 510< / span >   float4 queryValB = *(float4*) & query[laneId * 8 + 4];< / div >
< div class = "line" > < a name = "l00511" > < / a > < span class = "lineno" > 511< / span >   < / div >
< div class = "line" > < a name = "l00512" > < / a > < span class = "lineno" > 512< / span >   constexpr < span class = "keywordtype" > int< / span > kUnroll = 4;< / div >
< div class = "line" > < a name = "l00513" > < / a > < span class = "lineno" > 513< / span >   < / div >
< div class = "line" > < a name = "l00514" > < / a > < span class = "lineno" > 514< / span >   Half8 vecVal[kUnroll];< / div >
< div class = "line" > < a name = "l00515" > < / a > < span class = "lineno" > 515< / span >   < / div >
< div class = "line" > < a name = "l00516" > < / a > < span class = "lineno" > 516< / span >   < span class = "keywordtype" > int< / span > limit = utils::roundDown(numVecs, kUnroll * numWarps);< / div >
< div class = "line" > < a name = "l00517" > < / a > < span class = "lineno" > 517< / span >   < / div >
< div class = "line" > < a name = "l00518" > < / a > < span class = "lineno" > 518< / span >   < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > i = warpId; i < limit; i += kUnroll * numWarps) {< / div >
< div class = "line" > < a name = "l00519" > < / a > < span class = "lineno" > 519< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00520" > < / a > < span class = "lineno" > 520< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00521" > < / a > < span class = "lineno" > 521< / span >   < span class = "comment" > // Vector we are loading from is i< / span > < / div >
< div class = "line" > < a name = "l00522" > < / a > < span class = "lineno" > 522< / span >   < span class = "comment" > // Dim we are loading from is laneId * 8< / span > < / div >
< div class = "line" > < a name = "l00523" > < / a > < span class = "lineno" > 523< / span >   vecVal[j] =< / div >
< div class = "line" > < a name = "l00524" > < / a > < span class = "lineno" > 524< / span >   < a class = "code" href = "structfaiss_1_1gpu_1_1LoadStore.html" > LoadStore< Half8> ::load< / a > (< / div >
< div class = "line" > < a name = "l00525" > < / a > < span class = "lineno" > 525< / span >   & vecs[(i + j * numWarps) * kDims + laneId * 8]);< / div >
< div class = "line" > < a name = "l00526" > < / a > < span class = "lineno" > 526< / span >   }< / div >
< div class = "line" > < a name = "l00527" > < / a > < span class = "lineno" > 527< / span >   < / div >
< div class = "line" > < a name = "l00528" > < / a > < span class = "lineno" > 528< / span >   < span class = "keywordtype" > float< / span > dist[kUnroll];< / div >
< div class = "line" > < a name = "l00529" > < / a > < span class = "lineno" > 529< / span >   < / div >
< div class = "line" > < a name = "l00530" > < / a > < span class = "lineno" > 530< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00531" > < / a > < span class = "lineno" > 531< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00532" > < / a > < span class = "lineno" > 532< / span >   < span class = "keywordflow" > if< / span > (L2) {< / div >
< div class = "line" > < a name = "l00533" > < / a > < span class = "lineno" > 533< / span >   dist[j] = l2Distance(queryValA, half4ToFloat4(vecVal[j].a));< / div >
< div class = "line" > < a name = "l00534" > < / a > < span class = "lineno" > 534< / span >   dist[j] += l2Distance(queryValB, half4ToFloat4(vecVal[j].b));< / div >
< div class = "line" > < a name = "l00535" > < / a > < span class = "lineno" > 535< / span >   } < span class = "keywordflow" > else< / span > {< / div >
< div class = "line" > < a name = "l00536" > < / a > < span class = "lineno" > 536< / span >   dist[j] = ipDistance(queryValA, half4ToFloat4(vecVal[j].a));< / div >
< div class = "line" > < a name = "l00537" > < / a > < span class = "lineno" > 537< / span >   dist[j] += ipDistance(queryValB, half4ToFloat4(vecVal[j].b));< / div >
< div class = "line" > < a name = "l00538" > < / a > < span class = "lineno" > 538< / span >   }< / div >
< div class = "line" > < a name = "l00539" > < / a > < span class = "lineno" > 539< / span >   }< / div >
< div class = "line" > < a name = "l00540" > < / a > < span class = "lineno" > 540< / span >   < / div >
< div class = "line" > < a name = "l00541" > < / a > < span class = "lineno" > 541< / span >   < span class = "comment" > // Reduce within the warp< / span > < / div >
< div class = "line" > < a name = "l00542" > < / a > < span class = "lineno" > 542< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00543" > < / a > < span class = "lineno" > 543< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00544" > < / a > < span class = "lineno" > 544< / span >   dist[j] = warpReduceAllSum(dist[j]);< / div >
< div class = "line" > < a name = "l00545" > < / a > < span class = "lineno" > 545< / span >   }< / div >
< div class = "line" > < a name = "l00546" > < / a > < span class = "lineno" > 546< / span >   < / div >
< div class = "line" > < a name = "l00547" > < / a > < span class = "lineno" > 547< / span >   < span class = "keywordflow" > if< / span > (laneId == 0) {< / div >
< div class = "line" > < a name = "l00548" > < / a > < span class = "lineno" > 548< / span >   < span class = "preprocessor" > #pragma unroll< / span > < / div >
< div class = "line" > < a name = "l00549" > < / a > < span class = "lineno" > 549< / span >   < span class = "preprocessor" > < / span > < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > j = 0; j < kUnroll; ++j) {< / div >
< div class = "line" > < a name = "l00550" > < / a > < span class = "lineno" > 550< / span >   distanceOut[i + j * numWarps] = dist[j];< / div >
< div class = "line" > < a name = "l00551" > < / a > < span class = "lineno" > 551< / span >   }< / div >
< div class = "line" > < a name = "l00552" > < / a > < span class = "lineno" > 552< / span >   }< / div >
< div class = "line" > < a name = "l00553" > < / a > < span class = "lineno" > 553< / span >   }< / div >
< div class = "line" > < a name = "l00554" > < / a > < span class = "lineno" > 554< / span >   < / div >
< div class = "line" > < a name = "l00555" > < / a > < span class = "lineno" > 555< / span >   < span class = "comment" > // Handle remainder< / span > < / div >
< div class = "line" > < a name = "l00556" > < / a > < span class = "lineno" > 556< / span >   < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > i = limit + warpId; i < numVecs; i += numWarps) {< / div >
< div class = "line" > < a name = "l00557" > < / a > < span class = "lineno" > 557< / span >   vecVal[0] = LoadStore< Half8> ::load(& vecs[i * kDims + laneId * 8]);< / div >
< div class = "line" > < a name = "l00558" > < / a > < span class = "lineno" > 558< / span >   < / div >
< div class = "line" > < a name = "l00559" > < / a > < span class = "lineno" > 559< / span >   < span class = "keywordtype" > float< / span > dist;< / div >
< div class = "line" > < a name = "l00560" > < / a > < span class = "lineno" > 560< / span >   < span class = "keywordflow" > if< / span > (L2) {< / div >
< div class = "line" > < a name = "l00561" > < / a > < span class = "lineno" > 561< / span >   dist = l2Distance(queryValA, half4ToFloat4(vecVal[0].a));< / div >
< div class = "line" > < a name = "l00562" > < / a > < span class = "lineno" > 562< / span >   dist += l2Distance(queryValB, half4ToFloat4(vecVal[0].b));< / div >
< div class = "line" > < a name = "l00563" > < / a > < span class = "lineno" > 563< / span >   } < span class = "keywordflow" > else< / span > {< / div >
< div class = "line" > < a name = "l00564" > < / a > < span class = "lineno" > 564< / span >   dist = ipDistance(queryValA, half4ToFloat4(vecVal[0].a));< / div >
< div class = "line" > < a name = "l00565" > < / a > < span class = "lineno" > 565< / span >   dist += ipDistance(queryValB, half4ToFloat4(vecVal[0].b));< / div >
< div class = "line" > < a name = "l00566" > < / a > < span class = "lineno" > 566< / span >   }< / div >
< div class = "line" > < a name = "l00567" > < / a > < span class = "lineno" > 567< / span >   < / div >
< div class = "line" > < a name = "l00568" > < / a > < span class = "lineno" > 568< / span >   dist = warpReduceAllSum(dist);< / div >
< div class = "line" > < a name = "l00569" > < / a > < span class = "lineno" > 569< / span >   < / div >
< div class = "line" > < a name = "l00570" > < / a > < span class = "lineno" > 570< / span >   < span class = "keywordflow" > if< / span > (laneId == 0) {< / div >
< div class = "line" > < a name = "l00571" > < / a > < span class = "lineno" > 571< / span >   distanceOut[i] = dist;< / div >
< div class = "line" > < a name = "l00572" > < / a > < span class = "lineno" > 572< / span >   }< / div >
< div class = "line" > < a name = "l00573" > < / a > < span class = "lineno" > 573< / span >   }< / div >
< div class = "line" > < a name = "l00574" > < / a > < span class = "lineno" > 574< / span >   }< / div >
< div class = "line" > < a name = "l00575" > < / a > < span class = "lineno" > 575< / span >   };< / div >
< div class = "line" > < a name = "l00576" > < / a > < span class = "lineno" > 576< / span >   < / div >
< div class = "line" > < a name = "l00577" > < / a > < span class = "lineno" > 577< / span >   < span class = "preprocessor" > #endif< / span > < / div >
< div class = "line" > < a name = "l00578" > < / a > < span class = "lineno" > 578< / span >   < span class = "preprocessor" > < / span > < / div >
< div class = "line" > < a name = "l00579" > < / a > < span class = "lineno" > 579< / span >   < span class = "keyword" > template< / span > < < span class = "keywordtype" > int< / span > Dims, < span class = "keywordtype" > bool< / span > L2, < span class = "keyword" > typename< / span > T> < / div >
< div class = "line" > < a name = "l00580" > < / a > < span class = "lineno" > 580< / span >   __global__ < span class = "keywordtype" > void< / span > < / div >
< div class = "line" > < a name = "l00581" > < / a > < span class = "lineno" > 581< / span >   ivfFlatScan(Tensor< float, 2, true> queries,< / div >
< div class = "line" > < a name = "l00582" > < / a > < span class = "lineno" > 582< / span >   Tensor< int, 2, true> listIds,< / div >
< div class = "line" > < a name = "l00583" > < / a > < span class = "lineno" > 583< / span >   < span class = "keywordtype" > void< / span > ** allListData,< / div >
< div class = "line" > < a name = "l00584" > < / a > < span class = "lineno" > 584< / span >   < span class = "keywordtype" > int< / span > * listLengths,< / div >
< div class = "line" > < a name = "l00585" > < / a > < span class = "lineno" > 585< / span >   Tensor< int, 2, true> prefixSumOffsets,< / div >
< div class = "line" > < a name = "l00586" > < / a > < span class = "lineno" > 586< / span >   Tensor< float, 1, true> distance) {< / div >
< div class = "line" > < a name = "l00587" > < / a > < span class = "lineno" > 587< / span >   < span class = "keyword" > auto< / span > queryId = blockIdx.y;< / div >
< div class = "line" > < a name = "l00588" > < / a > < span class = "lineno" > 588< / span >   < span class = "keyword" > auto< / span > probeId = blockIdx.x;< / div >
< div class = "line" > < a name = "l00589" > < / a > < span class = "lineno" > 589< / span >   < / div >
< div class = "line" > < a name = "l00590" > < / a > < span class = "lineno" > 590< / span >   < span class = "comment" > // This is where we start writing out data< / span > < / div >
< div class = "line" > < a name = "l00591" > < / a > < span class = "lineno" > 591< / span >   < span class = "comment" > // We ensure that before the array (at offset -1), there is a 0 value< / span > < / div >
< div class = "line" > < a name = "l00592" > < / a > < span class = "lineno" > 592< / span >   < span class = "keywordtype" > int< / span > outBase = *(prefixSumOffsets[queryId][probeId].data() - 1);< / div >
< div class = "line" > < a name = "l00593" > < / a > < span class = "lineno" > 593< / span >   < / div >
< div class = "line" > < a name = "l00594" > < / a > < span class = "lineno" > 594< / span >   < span class = "keyword" > auto< / span > listId = listIds[queryId][probeId];< / div >
< div class = "line" > < a name = "l00595" > < / a > < span class = "lineno" > 595< / span >   < span class = "comment" > // Safety guard in case NaNs in input cause no list ID to be generated< / span > < / div >
< div class = "line" > < a name = "l00596" > < / a > < span class = "lineno" > 596< / span >   < span class = "keywordflow" > if< / span > (listId == -1) {< / div >
< div class = "line" > < a name = "l00597" > < / a > < span class = "lineno" > 597< / span >   < span class = "keywordflow" > return< / span > ;< / div >
< div class = "line" > < a name = "l00598" > < / a > < span class = "lineno" > 598< / span >   }< / div >
< div class = "line" > < a name = "l00599" > < / a > < span class = "lineno" > 599< / span >   < / div >
< div class = "line" > < a name = "l00600" > < / a > < span class = "lineno" > 600< / span >   < span class = "keyword" > auto< / span > query = queries[queryId].data();< / div >
< div class = "line" > < a name = "l00601" > < / a > < span class = "lineno" > 601< / span >   < span class = "keyword" > auto< / span > vecs = allListData[listId];< / div >
< div class = "line" > < a name = "l00602" > < / a > < span class = "lineno" > 602< / span >   < span class = "keyword" > auto< / span > numVecs = listLengths[listId];< / div >
< div class = "line" > < a name = "l00603" > < / a > < span class = "lineno" > 603< / span >   < span class = "keyword" > auto< / span > dim = queries.getSize(1);< / div >
< div class = "line" > < a name = "l00604" > < / a > < span class = "lineno" > 604< / span >   < span class = "keyword" > auto< / span > distanceOut = distance[outBase].data();< / div >
< div class = "line" > < a name = "l00605" > < / a > < span class = "lineno" > 605< / span >   < / div >
< div class = "line" > < a name = "l00606" > < / a > < span class = "lineno" > 606< / span >   IVFFlatScan< Dims, L2, T> ::scan(query, vecs, numVecs, dim, distanceOut);< / div >
< div class = "line" > < a name = "l00607" > < / a > < span class = "lineno" > 607< / span >   }< / div >
< div class = "line" > < a name = "l00608" > < / a > < span class = "lineno" > 608< / span >   < / div >
< div class = "line" > < a name = "l00609" > < / a > < span class = "lineno" > 609< / span >   < span class = "keywordtype" > void< / span > < / div >
< div class = "line" > < a name = "l00610" > < / a > < span class = "lineno" > 610< / span >   runIVFFlatScanTile(Tensor< float, 2, true> & queries,< / div >
< div class = "line" > < a name = "l00611" > < / a > < span class = "lineno" > 611< / span >   Tensor< int, 2, true> & listIds,< / div >
< div class = "line" > < a name = "l00612" > < / a > < span class = "lineno" > 612< / span >   thrust::device_vector< void*> & listData,< / div >
< div class = "line" > < a name = "l00613" > < / a > < span class = "lineno" > 613< / span >   thrust::device_vector< void*> & listIndices,< / div >
< div class = "line" > < a name = "l00614" > < / a > < span class = "lineno" > 614< / span >   IndicesOptions indicesOptions,< / div >
< div class = "line" > < a name = "l00615" > < / a > < span class = "lineno" > 615< / span >   thrust::device_vector< int> & listLengths,< / div >
< div class = "line" > < a name = "l00616" > < / a > < span class = "lineno" > 616< / span >   Tensor< char, 1, true> & thrustMem,< / div >
< div class = "line" > < a name = "l00617" > < / a > < span class = "lineno" > 617< / span >   Tensor< int, 2, true> & prefixSumOffsets,< / div >
< div class = "line" > < a name = "l00618" > < / a > < span class = "lineno" > 618< / span >   Tensor< float, 1, true> & allDistances,< / div >
< div class = "line" > < a name = "l00619" > < / a > < span class = "lineno" > 619< / span >   Tensor< float, 3, true> & heapDistances,< / div >
< div class = "line" > < a name = "l00620" > < / a > < span class = "lineno" > 620< / span >   Tensor< int, 3, true> & heapIndices,< / div >
< div class = "line" > < a name = "l00621" > < / a > < span class = "lineno" > 621< / span >   < span class = "keywordtype" > int< / span > k,< / div >
< div class = "line" > < a name = "l00622" > < / a > < span class = "lineno" > 622< / span >   < span class = "keywordtype" > bool< / span > l2Distance,< / div >
< div class = "line" > < a name = "l00623" > < / a > < span class = "lineno" > 623< / span >   < span class = "keywordtype" > bool< / span > useFloat16,< / div >
< div class = "line" > < a name = "l00624" > < / a > < span class = "lineno" > 624< / span >   Tensor< float, 2, true> & outDistances,< / div >
< div class = "line" > < a name = "l00625" > < / a > < span class = "lineno" > 625< / span >   Tensor< long, 2, true> & outIndices,< / div >
< div class = "line" > < a name = "l00626" > < / a > < span class = "lineno" > 626< / span >   cudaStream_t stream) {< / div >
< div class = "line" > < a name = "l00627" > < / a > < span class = "lineno" > 627< / span >   < span class = "comment" > // Calculate offset lengths, so we know where to write out< / span > < / div >
< div class = "line" > < a name = "l00628" > < / a > < span class = "lineno" > 628< / span >   < span class = "comment" > // intermediate results< / span > < / div >
< div class = "line" > < a name = "l00629" > < / a > < span class = "lineno" > 629< / span >   runCalcListOffsets(listIds, listLengths, prefixSumOffsets, thrustMem, stream);< / div >
< div class = "line" > < a name = "l00630" > < / a > < span class = "lineno" > 630< / span >   < / div >
< div class = "line" > < a name = "l00631" > < / a > < span class = "lineno" > 631< / span >   < span class = "comment" > // Calculate distances for vectors within our chunk of lists< / span > < / div >
< div class = "line" > < a name = "l00632" > < / a > < span class = "lineno" > 632< / span >   constexpr < span class = "keywordtype" > int< / span > kMaxThreadsIVF = 512;< / div >
< div class = "line" > < a name = "l00633" > < / a > < span class = "lineno" > 633< / span >   < / div >
< div class = "line" > < a name = "l00634" > < / a > < span class = "lineno" > 634< / span >   < span class = "comment" > // FIXME: if `half` and # dims is multiple of 2, halve the< / span > < / div >
< div class = "line" > < a name = "l00635" > < / a > < span class = "lineno" > 635< / span >   < span class = "comment" > // threadblock size< / span > < / div >
< div class = "line" > < a name = "l00636" > < / a > < span class = "lineno" > 636< / span >   < / div >
< div class = "line" > < a name = "l00637" > < / a > < span class = "lineno" > 637< / span >   < span class = "keywordtype" > int< / span > dim = queries.getSize(1);< / div >
< div class = "line" > < a name = "l00638" > < / a > < span class = "lineno" > 638< / span >   < span class = "keywordtype" > int< / span > numThreads = std::min(dim, kMaxThreadsIVF);< / div >
< div class = "line" > < a name = "l00639" > < / a > < span class = "lineno" > 639< / span >   < / div >
< div class = "line" > < a name = "l00640" > < / a > < span class = "lineno" > 640< / span >   < span class = "keyword" > auto< / span > grid = dim3(listIds.getSize(1),< / div >
< div class = "line" > < a name = "l00641" > < / a > < span class = "lineno" > 641< / span >   listIds.getSize(0));< / div >
< div class = "line" > < a name = "l00642" > < / a > < span class = "lineno" > 642< / span >   < span class = "keyword" > auto< / span > block = dim3(numThreads);< / div >
< div class = "line" > < a name = "l00643" > < / a > < span class = "lineno" > 643< / span >   < span class = "comment" > // All exact dim kernels are unrolled by 4, hence the `4`< / span > < / div >
< div class = "line" > < a name = "l00644" > < / a > < span class = "lineno" > 644< / span >   < span class = "keyword" > auto< / span > smem = < span class = "keyword" > sizeof< / span > (float) * utils::divUp(numThreads, kWarpSize) * 4;< / div >
< div class = "line" > < a name = "l00645" > < / a > < span class = "lineno" > 645< / span >   < / div >
< div class = "line" > < a name = "l00646" > < / a > < span class = "lineno" > 646< / span >   < span class = "preprocessor" > #define RUN_IVF_FLAT(DIMS, L2, T) \< / span > < / div >
< div class = "line" > < a name = "l00647" > < / a > < span class = "lineno" > 647< / span >   < span class = "preprocessor" > do { \< / span > < / div >
< div class = "line" > < a name = "l00648" > < / a > < span class = "lineno" > 648< / span >   < span class = "preprocessor" > ivfFlatScan< DIMS, L2, T> \< / span > < / div >
< div class = "line" > < a name = "l00649" > < / a > < span class = "lineno" > 649< / span >   < span class = "preprocessor" > < < < grid, block, smem, stream> > > ( \< / span > < / div >
< div class = "line" > < a name = "l00650" > < / a > < span class = "lineno" > 650< / span >   < span class = "preprocessor" > queries, \< / span > < / div >
< div class = "line" > < a name = "l00651" > < / a > < span class = "lineno" > 651< / span >   < span class = "preprocessor" > listIds, \< / span > < / div >
< div class = "line" > < a name = "l00652" > < / a > < span class = "lineno" > 652< / span >   < span class = "preprocessor" > listData.data().get(), \< / span > < / div >
< div class = "line" > < a name = "l00653" > < / a > < span class = "lineno" > 653< / span >   < span class = "preprocessor" > listLengths.data().get(), \< / span > < / div >
< div class = "line" > < a name = "l00654" > < / a > < span class = "lineno" > 654< / span >   < span class = "preprocessor" > prefixSumOffsets, \< / span > < / div >
< div class = "line" > < a name = "l00655" > < / a > < span class = "lineno" > 655< / span >   < span class = "preprocessor" > allDistances); \< / span > < / div >
< div class = "line" > < a name = "l00656" > < / a > < span class = "lineno" > 656< / span >   < span class = "preprocessor" > } while (0)< / span > < / div >
< div class = "line" > < a name = "l00657" > < / a > < span class = "lineno" > 657< / span >   < span class = "preprocessor" > < / span > < / div >
< div class = "line" > < a name = "l00658" > < / a > < span class = "lineno" > 658< / span >   < span class = "preprocessor" > #ifdef FAISS_USE_FLOAT16< / span > < / div >
< div class = "line" > < a name = "l00659" > < / a > < span class = "lineno" > 659< / span >   < span class = "preprocessor" > < / span > < / div >
< div class = "line" > < a name = "l00660" > < / a > < span class = "lineno" > 660< / span >   < span class = "preprocessor" > #define HANDLE_DIM_CASE(DIMS) \< / span > < / div >
< div class = "line" > < a name = "l00661" > < / a > < span class = "lineno" > 661< / span >   < span class = "preprocessor" > do { \< / span > < / div >
< div class = "line" > < a name = "l00662" > < / a > < span class = "lineno" > 662< / span >   < span class = "preprocessor" > if (l2Distance) { \< / span > < / div >
< div class = "line" > < a name = "l00663" > < / a > < span class = "lineno" > 663< / span >   < span class = "preprocessor" > if (useFloat16) { \< / span > < / div >
< div class = "line" > < a name = "l00664" > < / a > < span class = "lineno" > 664< / span >   < span class = "preprocessor" > RUN_IVF_FLAT(DIMS, true, half); \< / span > < / div >
< div class = "line" > < a name = "l00665" > < / a > < span class = "lineno" > 665< / span >   < span class = "preprocessor" > } else { \< / span > < / div >
< div class = "line" > < a name = "l00666" > < / a > < span class = "lineno" > 666< / span >   < span class = "preprocessor" > RUN_IVF_FLAT(DIMS, true, float); \< / span > < / div >
< div class = "line" > < a name = "l00667" > < / a > < span class = "lineno" > 667< / span >   < span class = "preprocessor" > } \< / span > < / div >
< div class = "line" > < a name = "l00668" > < / a > < span class = "lineno" > 668< / span >   < span class = "preprocessor" > } else { \< / span > < / div >
< div class = "line" > < a name = "l00669" > < / a > < span class = "lineno" > 669< / span >   < span class = "preprocessor" > if (useFloat16) { \< / span > < / div >
< div class = "line" > < a name = "l00670" > < / a > < span class = "lineno" > 670< / span >   < span class = "preprocessor" > RUN_IVF_FLAT(DIMS, false, half); \< / span > < / div >
< div class = "line" > < a name = "l00671" > < / a > < span class = "lineno" > 671< / span >   < span class = "preprocessor" > } else { \< / span > < / div >
< div class = "line" > < a name = "l00672" > < / a > < span class = "lineno" > 672< / span >   < span class = "preprocessor" > RUN_IVF_FLAT(DIMS, false, float); \< / span > < / div >
< div class = "line" > < a name = "l00673" > < / a > < span class = "lineno" > 673< / span >   < span class = "preprocessor" > } \< / span > < / div >
< div class = "line" > < a name = "l00674" > < / a > < span class = "lineno" > 674< / span >   < span class = "preprocessor" > } \< / span > < / div >
< div class = "line" > < a name = "l00675" > < / a > < span class = "lineno" > 675< / span >   < span class = "preprocessor" > } while (0)< / span > < / div >
< div class = "line" > < a name = "l00676" > < / a > < span class = "lineno" > 676< / span >   < span class = "preprocessor" > < / span > < span class = "preprocessor" > #else< / span > < / div >
< div class = "line" > < a name = "l00677" > < / a > < span class = "lineno" > 677< / span >   < span class = "preprocessor" > < / span > < / div >
< div class = "line" > < a name = "l00678" > < / a > < span class = "lineno" > 678< / span >   < span class = "preprocessor" > #define HANDLE_DIM_CASE(DIMS) \< / span > < / div >
< div class = "line" > < a name = "l00679" > < / a > < span class = "lineno" > 679< / span >   < span class = "preprocessor" > do { \< / span > < / div >
< div class = "line" > < a name = "l00680" > < / a > < span class = "lineno" > 680< / span >   < span class = "preprocessor" > if (l2Distance) { \< / span > < / div >
< div class = "line" > < a name = "l00681" > < / a > < span class = "lineno" > 681< / span >   < span class = "preprocessor" > if (useFloat16) { \< / span > < / div >
< div class = "line" > < a name = "l00682" > < / a > < span class = "lineno" > 682< / span >   < span class = "preprocessor" > FAISS_ASSERT(false); \< / span > < / div >
< div class = "line" > < a name = "l00683" > < / a > < span class = "lineno" > 683< / span >   < span class = "preprocessor" > } else { \< / span > < / div >
< div class = "line" > < a name = "l00684" > < / a > < span class = "lineno" > 684< / span >   < span class = "preprocessor" > RUN_IVF_FLAT(DIMS, true, float); \< / span > < / div >
< div class = "line" > < a name = "l00685" > < / a > < span class = "lineno" > 685< / span >   < span class = "preprocessor" > } \< / span > < / div >
< div class = "line" > < a name = "l00686" > < / a > < span class = "lineno" > 686< / span >   < span class = "preprocessor" > } else { \< / span > < / div >
< div class = "line" > < a name = "l00687" > < / a > < span class = "lineno" > 687< / span >   < span class = "preprocessor" > if (useFloat16) { \< / span > < / div >
< div class = "line" > < a name = "l00688" > < / a > < span class = "lineno" > 688< / span >   < span class = "preprocessor" > FAISS_ASSERT(false); \< / span > < / div >
< div class = "line" > < a name = "l00689" > < / a > < span class = "lineno" > 689< / span >   < span class = "preprocessor" > } else { \< / span > < / div >
< div class = "line" > < a name = "l00690" > < / a > < span class = "lineno" > 690< / span >   < span class = "preprocessor" > RUN_IVF_FLAT(DIMS, false, float); \< / span > < / div >
< div class = "line" > < a name = "l00691" > < / a > < span class = "lineno" > 691< / span >   < span class = "preprocessor" > } \< / span > < / div >
< div class = "line" > < a name = "l00692" > < / a > < span class = "lineno" > 692< / span >   < span class = "preprocessor" > } \< / span > < / div >
< div class = "line" > < a name = "l00693" > < / a > < span class = "lineno" > 693< / span >   < span class = "preprocessor" > } while (0)< / span > < / div >
< div class = "line" > < a name = "l00694" > < / a > < span class = "lineno" > 694< / span >   < span class = "preprocessor" > < / span > < / div >
< div class = "line" > < a name = "l00695" > < / a > < span class = "lineno" > 695< / span >   < span class = "preprocessor" > #endif // FAISS_USE_FLOAT16< / span > < / div >
< div class = "line" > < a name = "l00696" > < / a > < span class = "lineno" > 696< / span >   < span class = "preprocessor" > < / span > < / div >
< div class = "line" > < a name = "l00697" > < / a > < span class = "lineno" > 697< / span >   < span class = "keywordflow" > if< / span > (dim == 64) {< / div >
< div class = "line" > < a name = "l00698" > < / a > < span class = "lineno" > 698< / span >   HANDLE_DIM_CASE(64);< / div >
< div class = "line" > < a name = "l00699" > < / a > < span class = "lineno" > 699< / span >   } < span class = "keywordflow" > else< / span > < span class = "keywordflow" > if< / span > (dim == 128) {< / div >
< div class = "line" > < a name = "l00700" > < / a > < span class = "lineno" > 700< / span >   HANDLE_DIM_CASE(128);< / div >
< div class = "line" > < a name = "l00701" > < / a > < span class = "lineno" > 701< / span >   } < span class = "keywordflow" > else< / span > < span class = "keywordflow" > if< / span > (dim == 256) {< / div >
< div class = "line" > < a name = "l00702" > < / a > < span class = "lineno" > 702< / span >   HANDLE_DIM_CASE(256);< / div >
< div class = "line" > < a name = "l00703" > < / a > < span class = "lineno" > 703< / span >   } < span class = "keywordflow" > else< / span > < span class = "keywordflow" > if< / span > (dim < = kMaxThreadsIVF) {< / div >
< div class = "line" > < a name = "l00704" > < / a > < span class = "lineno" > 704< / span >   HANDLE_DIM_CASE(0);< / div >
< div class = "line" > < a name = "l00705" > < / a > < span class = "lineno" > 705< / span >   } < span class = "keywordflow" > else< / span > {< / div >
< div class = "line" > < a name = "l00706" > < / a > < span class = "lineno" > 706< / span >   HANDLE_DIM_CASE(-1);< / div >
< div class = "line" > < a name = "l00707" > < / a > < span class = "lineno" > 707< / span >   }< / div >
< div class = "line" > < a name = "l00708" > < / a > < span class = "lineno" > 708< / span >   < / div >
< div class = "line" > < a name = "l00709" > < / a > < span class = "lineno" > 709< / span >   CUDA_TEST_ERROR();< / div >
< div class = "line" > < a name = "l00710" > < / a > < span class = "lineno" > 710< / span >   < / div >
< div class = "line" > < a name = "l00711" > < / a > < span class = "lineno" > 711< / span >   < span class = "preprocessor" > #undef HANDLE_DIM_CASE< / span > < / div >
< div class = "line" > < a name = "l00712" > < / a > < span class = "lineno" > 712< / span >   < span class = "preprocessor" > < / span > < span class = "preprocessor" > #undef RUN_IVF_FLAT< / span > < / div >
< div class = "line" > < a name = "l00713" > < / a > < span class = "lineno" > 713< / span >   < span class = "preprocessor" > < / span > < / div >
< div class = "line" > < a name = "l00714" > < / a > < span class = "lineno" > 714< / span >   < span class = "comment" > // k-select the output in chunks, to increase parallelism< / span > < / div >
< div class = "line" > < a name = "l00715" > < / a > < span class = "lineno" > 715< / span >   runPass1SelectLists(prefixSumOffsets,< / div >
< div class = "line" > < a name = "l00716" > < / a > < span class = "lineno" > 716< / span >   allDistances,< / div >
< div class = "line" > < a name = "l00717" > < / a > < span class = "lineno" > 717< / span >   listIds.getSize(1),< / div >
< div class = "line" > < a name = "l00718" > < / a > < span class = "lineno" > 718< / span >   k,< / div >
< div class = "line" > < a name = "l00719" > < / a > < span class = "lineno" > 719< / span >   !l2Distance, < span class = "comment" > // L2 distance chooses smallest< / span > < / div >
< div class = "line" > < a name = "l00720" > < / a > < span class = "lineno" > 720< / span >   heapDistances,< / div >
< div class = "line" > < a name = "l00721" > < / a > < span class = "lineno" > 721< / span >   heapIndices,< / div >
< div class = "line" > < a name = "l00722" > < / a > < span class = "lineno" > 722< / span >   stream);< / div >
< div class = "line" > < a name = "l00723" > < / a > < span class = "lineno" > 723< / span >   < / div >
< div class = "line" > < a name = "l00724" > < / a > < span class = "lineno" > 724< / span >   < span class = "comment" > // k-select final output< / span > < / div >
< div class = "line" > < a name = "l00725" > < / a > < span class = "lineno" > 725< / span >   < span class = "keyword" > auto< / span > flatHeapDistances = heapDistances.downcastInner< 2> ();< / div >
< div class = "line" > < a name = "l00726" > < / a > < span class = "lineno" > 726< / span >   < span class = "keyword" > auto< / span > flatHeapIndices = heapIndices.downcastInner< 2> ();< / div >
< div class = "line" > < a name = "l00727" > < / a > < span class = "lineno" > 727< / span >   < / div >
< div class = "line" > < a name = "l00728" > < / a > < span class = "lineno" > 728< / span >   runPass2SelectLists(flatHeapDistances,< / div >
< div class = "line" > < a name = "l00729" > < / a > < span class = "lineno" > 729< / span >   flatHeapIndices,< / div >
< div class = "line" > < a name = "l00730" > < / a > < span class = "lineno" > 730< / span >   listIndices,< / div >
< div class = "line" > < a name = "l00731" > < / a > < span class = "lineno" > 731< / span >   indicesOptions,< / div >
< div class = "line" > < a name = "l00732" > < / a > < span class = "lineno" > 732< / span >   prefixSumOffsets,< / div >
< div class = "line" > < a name = "l00733" > < / a > < span class = "lineno" > 733< / span >   listIds,< / div >
< div class = "line" > < a name = "l00734" > < / a > < span class = "lineno" > 734< / span >   k,< / div >
< div class = "line" > < a name = "l00735" > < / a > < span class = "lineno" > 735< / span >   !l2Distance, < span class = "comment" > // L2 distance chooses smallest< / span > < / div >
< div class = "line" > < a name = "l00736" > < / a > < span class = "lineno" > 736< / span >   outDistances,< / div >
< div class = "line" > < a name = "l00737" > < / a > < span class = "lineno" > 737< / span >   outIndices,< / div >
< div class = "line" > < a name = "l00738" > < / a > < span class = "lineno" > 738< / span >   stream);< / div >
< div class = "line" > < a name = "l00739" > < / a > < span class = "lineno" > 739< / span >   }< / div >
< div class = "line" > < a name = "l00740" > < / a > < span class = "lineno" > 740< / span >   < / div >
< div class = "line" > < a name = "l00741" > < / a > < span class = "lineno" > 741< / span >   < span class = "keywordtype" > void< / span > < / div >
< div class = "line" > < a name = "l00742" > < / a > < span class = "lineno" > 742< / span >   runIVFFlatScan(Tensor< float, 2, true> & queries,< / div >
< div class = "line" > < a name = "l00743" > < / a > < span class = "lineno" > 743< / span >   Tensor< int, 2, true> & listIds,< / div >
< div class = "line" > < a name = "l00744" > < / a > < span class = "lineno" > 744< / span >   thrust::device_vector< void*> & listData,< / div >
< div class = "line" > < a name = "l00745" > < / a > < span class = "lineno" > 745< / span >   thrust::device_vector< void*> & listIndices,< / div >
< div class = "line" > < a name = "l00746" > < / a > < span class = "lineno" > 746< / span >   IndicesOptions indicesOptions,< / div >
< div class = "line" > < a name = "l00747" > < / a > < span class = "lineno" > 747< / span >   thrust::device_vector< int> & listLengths,< / div >
< div class = "line" > < a name = "l00748" > < / a > < span class = "lineno" > 748< / span >   < span class = "keywordtype" > int< / span > maxListLength,< / div >
< div class = "line" > < a name = "l00749" > < / a > < span class = "lineno" > 749< / span >   < span class = "keywordtype" > int< / span > k,< / div >
< div class = "line" > < a name = "l00750" > < / a > < span class = "lineno" > 750< / span >   < span class = "keywordtype" > bool< / span > l2Distance,< / div >
< div class = "line" > < a name = "l00751" > < / a > < span class = "lineno" > 751< / span >   < span class = "keywordtype" > bool< / span > useFloat16,< / div >
< div class = "line" > < a name = "l00752" > < / a > < span class = "lineno" > 752< / span >   < span class = "comment" > // output< / span > < / div >
< div class = "line" > < a name = "l00753" > < / a > < span class = "lineno" > 753< / span >   Tensor< float, 2, true> & outDistances,< / div >
< div class = "line" > < a name = "l00754" > < / a > < span class = "lineno" > 754< / span >   < span class = "comment" > // output< / span > < / div >
< div class = "line" > < a name = "l00755" > < / a > < span class = "lineno" > 755< / span >   Tensor< long, 2, true> & outIndices,< / div >
< div class = "line" > < a name = "l00756" > < / a > < span class = "lineno" > 756< / span >   GpuResources* res) {< / div >
< div class = "line" > < a name = "l00757" > < / a > < span class = "lineno" > 757< / span >   constexpr < span class = "keywordtype" > int< / span > kMinQueryTileSize = 8;< / div >
< div class = "line" > < a name = "l00758" > < / a > < span class = "lineno" > 758< / span >   constexpr < span class = "keywordtype" > int< / span > kMaxQueryTileSize = 128;< / div >
< div class = "line" > < a name = "l00759" > < / a > < span class = "lineno" > 759< / span >   constexpr < span class = "keywordtype" > int< / span > kThrustMemSize = 16384;< / div >
< div class = "line" > < a name = "l00760" > < / a > < span class = "lineno" > 760< / span >   < / div >
< div class = "line" > < a name = "l00761" > < / a > < span class = "lineno" > 761< / span >   < span class = "keywordtype" > int< / span > nprobe = listIds.getSize(1);< / div >
< div class = "line" > < a name = "l00762" > < / a > < span class = "lineno" > 762< / span >   < / div >
< div class = "line" > < a name = "l00763" > < / a > < span class = "lineno" > 763< / span >   < span class = "keyword" > auto< / span > & mem = res-> getMemoryManagerCurrentDevice();< / div >
< div class = "line" > < a name = "l00764" > < / a > < span class = "lineno" > 764< / span >   < span class = "keyword" > auto< / span > stream = res-> getDefaultStreamCurrentDevice();< / div >
< div class = "line" > < a name = "l00765" > < / a > < span class = "lineno" > 765< / span >   < / div >
< div class = "line" > < a name = "l00766" > < / a > < span class = "lineno" > 766< / span >   < span class = "comment" > // Make a reservation for Thrust to do its dirty work (global memory< / span > < / div >
< div class = "line" > < a name = "l00767" > < / a > < span class = "lineno" > 767< / span >   < span class = "comment" > // cross-block reduction space); hopefully this is large enough.< / span > < / div >
< div class = "line" > < a name = "l00768" > < / a > < span class = "lineno" > 768< / span >   DeviceTensor< char, 1, true> thrustMem1(< / div >
< div class = "line" > < a name = "l00769" > < / a > < span class = "lineno" > 769< / span >   mem, {kThrustMemSize}, stream);< / div >
< div class = "line" > < a name = "l00770" > < / a > < span class = "lineno" > 770< / span >   DeviceTensor< char, 1, true> thrustMem2(< / div >
< div class = "line" > < a name = "l00771" > < / a > < span class = "lineno" > 771< / span >   mem, {kThrustMemSize}, stream);< / div >
< div class = "line" > < a name = "l00772" > < / a > < span class = "lineno" > 772< / span >   DeviceTensor< char, 1, true> * thrustMem[2] =< / div >
< div class = "line" > < a name = "l00773" > < / a > < span class = "lineno" > 773< / span >   {& thrustMem1, & thrustMem2};< / div >
< div class = "line" > < a name = "l00774" > < / a > < span class = "lineno" > 774< / span >   < / div >
< div class = "line" > < a name = "l00775" > < / a > < span class = "lineno" > 775< / span >   < span class = "comment" > // How much temporary storage is available?< / span > < / div >
< div class = "line" > < a name = "l00776" > < / a > < span class = "lineno" > 776< / span >   < span class = "comment" > // If possible, we' d like to fit within the space available.< / span > < / div >
< div class = "line" > < a name = "l00777" > < / a > < span class = "lineno" > 777< / span >   < span class = "keywordtype" > size_t< / span > sizeAvailable = mem.getSizeAvailable();< / div >
< div class = "line" > < a name = "l00778" > < / a > < span class = "lineno" > 778< / span >   < / div >
< div class = "line" > < a name = "l00779" > < / a > < span class = "lineno" > 779< / span >   < span class = "comment" > // We run two passes of heap selection< / span > < / div >
< div class = "line" > < a name = "l00780" > < / a > < span class = "lineno" > 780< / span >   < span class = "comment" > // This is the size of the first-level heap passes< / span > < / div >
< div class = "line" > < a name = "l00781" > < / a > < span class = "lineno" > 781< / span >   constexpr < span class = "keywordtype" > int< / span > kNProbeSplit = 8;< / div >
< div class = "line" > < a name = "l00782" > < / a > < span class = "lineno" > 782< / span >   < span class = "keywordtype" > int< / span > pass2Chunks = std::min(nprobe, kNProbeSplit);< / div >
< div class = "line" > < a name = "l00783" > < / a > < span class = "lineno" > 783< / span >   < / div >
< div class = "line" > < a name = "l00784" > < / a > < span class = "lineno" > 784< / span >   < span class = "keywordtype" > size_t< / span > sizeForFirstSelectPass =< / div >
< div class = "line" > < a name = "l00785" > < / a > < span class = "lineno" > 785< / span >   pass2Chunks * k * (< span class = "keyword" > sizeof< / span > (float) + < span class = "keyword" > sizeof< / span > (< span class = "keywordtype" > int< / span > ));< / div >
< div class = "line" > < a name = "l00786" > < / a > < span class = "lineno" > 786< / span >   < / div >
< div class = "line" > < a name = "l00787" > < / a > < span class = "lineno" > 787< / span >   < span class = "comment" > // How much temporary storage we need per each query< / span > < / div >
< div class = "line" > < a name = "l00788" > < / a > < span class = "lineno" > 788< / span >   < span class = "keywordtype" > size_t< / span > sizePerQuery =< / div >
< div class = "line" > < a name = "l00789" > < / a > < span class = "lineno" > 789< / span >   2 * < span class = "comment" > // # streams< / span > < / div >
< div class = "line" > < a name = "l00790" > < / a > < span class = "lineno" > 790< / span >   ((nprobe * < span class = "keyword" > sizeof< / span > (int) + < span class = "keyword" > sizeof< / span > (< span class = "keywordtype" > int< / span > )) + < span class = "comment" > // prefixSumOffsets< / span > < / div >
< div class = "line" > < a name = "l00791" > < / a > < span class = "lineno" > 791< / span >   nprobe * maxListLength * < span class = "keyword" > sizeof< / span > (< span class = "keywordtype" > float< / span > ) + < span class = "comment" > // allDistances< / span > < / div >
< div class = "line" > < a name = "l00792" > < / a > < span class = "lineno" > 792< / span >   sizeForFirstSelectPass);< / div >
< div class = "line" > < a name = "l00793" > < / a > < span class = "lineno" > 793< / span >   < / div >
< div class = "line" > < a name = "l00794" > < / a > < span class = "lineno" > 794< / span >   < span class = "keywordtype" > int< / span > queryTileSize = (int) (sizeAvailable / sizePerQuery);< / div >
< div class = "line" > < a name = "l00795" > < / a > < span class = "lineno" > 795< / span >   < / div >
< div class = "line" > < a name = "l00796" > < / a > < span class = "lineno" > 796< / span >   < span class = "keywordflow" > if< / span > (queryTileSize < kMinQueryTileSize) {< / div >
< div class = "line" > < a name = "l00797" > < / a > < span class = "lineno" > 797< / span >   queryTileSize = kMinQueryTileSize;< / div >
< div class = "line" > < a name = "l00798" > < / a > < span class = "lineno" > 798< / span >   } < span class = "keywordflow" > else< / span > < span class = "keywordflow" > if< / span > (queryTileSize > kMaxQueryTileSize) {< / div >
< div class = "line" > < a name = "l00799" > < / a > < span class = "lineno" > 799< / span >   queryTileSize = kMaxQueryTileSize;< / div >
< div class = "line" > < a name = "l00800" > < / a > < span class = "lineno" > 800< / span >   }< / div >
< div class = "line" > < a name = "l00801" > < / a > < span class = "lineno" > 801< / span >   < / div >
< div class = "line" > < a name = "l00802" > < / a > < span class = "lineno" > 802< / span >   < span class = "comment" > // FIXME: we should adjust queryTileSize to deal with this, since< / span > < / div >
< div class = "line" > < a name = "l00803" > < / a > < span class = "lineno" > 803< / span >   < span class = "comment" > // indexing is in int32< / span > < / div >
< div class = "line" > < a name = "l00804" > < / a > < span class = "lineno" > 804< / span >   FAISS_ASSERT(queryTileSize * nprobe * maxListLength < < / div >
< div class = "line" > < a name = "l00805" > < / a > < span class = "lineno" > 805< / span >   std::numeric_limits< int> ::max());< / div >
< div class = "line" > < a name = "l00806" > < / a > < span class = "lineno" > 806< / span >   < / div >
< div class = "line" > < a name = "l00807" > < / a > < span class = "lineno" > 807< / span >   < span class = "comment" > // Temporary memory buffers< / span > < / div >
< div class = "line" > < a name = "l00808" > < / a > < span class = "lineno" > 808< / span >   < span class = "comment" > // Make sure there is space prior to the start which will be 0, and< / span > < / div >
< div class = "line" > < a name = "l00809" > < / a > < span class = "lineno" > 809< / span >   < span class = "comment" > // will handle the boundary condition without branches< / span > < / div >
< div class = "line" > < a name = "l00810" > < / a > < span class = "lineno" > 810< / span >   DeviceTensor< int, 1, true> prefixSumOffsetSpace1(< / div >
< div class = "line" > < a name = "l00811" > < / a > < span class = "lineno" > 811< / span >   mem, {queryTileSize * nprobe + 1}, stream);< / div >
< div class = "line" > < a name = "l00812" > < / a > < span class = "lineno" > 812< / span >   DeviceTensor< int, 1, true> prefixSumOffsetSpace2(< / div >
< div class = "line" > < a name = "l00813" > < / a > < span class = "lineno" > 813< / span >   mem, {queryTileSize * nprobe + 1}, stream);< / div >
< div class = "line" > < a name = "l00814" > < / a > < span class = "lineno" > 814< / span >   < / div >
< div class = "line" > < a name = "l00815" > < / a > < span class = "lineno" > 815< / span >   DeviceTensor< int, 2, true> prefixSumOffsets1(< / div >
< div class = "line" > < a name = "l00816" > < / a > < span class = "lineno" > 816< / span >   prefixSumOffsetSpace1[1].data(),< / div >
< div class = "line" > < a name = "l00817" > < / a > < span class = "lineno" > 817< / span >   {queryTileSize, nprobe});< / div >
< div class = "line" > < a name = "l00818" > < / a > < span class = "lineno" > 818< / span >   DeviceTensor< int, 2, true> prefixSumOffsets2(< / div >
< div class = "line" > < a name = "l00819" > < / a > < span class = "lineno" > 819< / span >   prefixSumOffsetSpace2[1].data(),< / div >
< div class = "line" > < a name = "l00820" > < / a > < span class = "lineno" > 820< / span >   {queryTileSize, nprobe});< / div >
< div class = "line" > < a name = "l00821" > < / a > < span class = "lineno" > 821< / span >   DeviceTensor< int, 2, true> * prefixSumOffsets[2] =< / div >
< div class = "line" > < a name = "l00822" > < / a > < span class = "lineno" > 822< / span >   {& prefixSumOffsets1, & prefixSumOffsets2};< / div >
< div class = "line" > < a name = "l00823" > < / a > < span class = "lineno" > 823< / span >   < / div >
< div class = "line" > < a name = "l00824" > < / a > < span class = "lineno" > 824< / span >   < span class = "comment" > // Make sure the element before prefixSumOffsets is 0, since we< / span > < / div >
< div class = "line" > < a name = "l00825" > < / a > < span class = "lineno" > 825< / span >   < span class = "comment" > // depend upon simple, boundary-less indexing to get proper results< / span > < / div >
< div class = "line" > < a name = "l00826" > < / a > < span class = "lineno" > 826< / span >   CUDA_VERIFY(cudaMemsetAsync(prefixSumOffsetSpace1.data(),< / div >
< div class = "line" > < a name = "l00827" > < / a > < span class = "lineno" > 827< / span >   0,< / div >
< div class = "line" > < a name = "l00828" > < / a > < span class = "lineno" > 828< / span >   < span class = "keyword" > sizeof< / span > (int),< / div >
< div class = "line" > < a name = "l00829" > < / a > < span class = "lineno" > 829< / span >   stream));< / div >
< div class = "line" > < a name = "l00830" > < / a > < span class = "lineno" > 830< / span >   CUDA_VERIFY(cudaMemsetAsync(prefixSumOffsetSpace2.data(),< / div >
< div class = "line" > < a name = "l00831" > < / a > < span class = "lineno" > 831< / span >   0,< / div >
< div class = "line" > < a name = "l00832" > < / a > < span class = "lineno" > 832< / span >   < span class = "keyword" > sizeof< / span > (int),< / div >
< div class = "line" > < a name = "l00833" > < / a > < span class = "lineno" > 833< / span >   stream));< / div >
< div class = "line" > < a name = "l00834" > < / a > < span class = "lineno" > 834< / span >   < / div >
< div class = "line" > < a name = "l00835" > < / a > < span class = "lineno" > 835< / span >   DeviceTensor< float, 1, true> allDistances1(< / div >
< div class = "line" > < a name = "l00836" > < / a > < span class = "lineno" > 836< / span >   mem, {queryTileSize * nprobe * maxListLength}, stream);< / div >
< div class = "line" > < a name = "l00837" > < / a > < span class = "lineno" > 837< / span >   DeviceTensor< float, 1, true> allDistances2(< / div >
< div class = "line" > < a name = "l00838" > < / a > < span class = "lineno" > 838< / span >   mem, {queryTileSize * nprobe * maxListLength}, stream);< / div >
< div class = "line" > < a name = "l00839" > < / a > < span class = "lineno" > 839< / span >   DeviceTensor< float, 1, true> * allDistances[2] =< / div >
< div class = "line" > < a name = "l00840" > < / a > < span class = "lineno" > 840< / span >   {& allDistances1, & allDistances2};< / div >
< div class = "line" > < a name = "l00841" > < / a > < span class = "lineno" > 841< / span >   < / div >
< div class = "line" > < a name = "l00842" > < / a > < span class = "lineno" > 842< / span >   DeviceTensor< float, 3, true> heapDistances1(< / div >
< div class = "line" > < a name = "l00843" > < / a > < span class = "lineno" > 843< / span >   mem, {queryTileSize, pass2Chunks, k}, stream);< / div >
< div class = "line" > < a name = "l00844" > < / a > < span class = "lineno" > 844< / span >   DeviceTensor< float, 3, true> heapDistances2(< / div >
< div class = "line" > < a name = "l00845" > < / a > < span class = "lineno" > 845< / span >   mem, {queryTileSize, pass2Chunks, k}, stream);< / div >
< div class = "line" > < a name = "l00846" > < / a > < span class = "lineno" > 846< / span >   DeviceTensor< float, 3, true> * heapDistances[2] =< / div >
< div class = "line" > < a name = "l00847" > < / a > < span class = "lineno" > 847< / span >   {& heapDistances1, & heapDistances2};< / div >
< div class = "line" > < a name = "l00848" > < / a > < span class = "lineno" > 848< / span >   < / div >
< div class = "line" > < a name = "l00849" > < / a > < span class = "lineno" > 849< / span >   DeviceTensor< int, 3, true> heapIndices1(< / div >
< div class = "line" > < a name = "l00850" > < / a > < span class = "lineno" > 850< / span >   mem, {queryTileSize, pass2Chunks, k}, stream);< / div >
< div class = "line" > < a name = "l00851" > < / a > < span class = "lineno" > 851< / span >   DeviceTensor< int, 3, true> heapIndices2(< / div >
< div class = "line" > < a name = "l00852" > < / a > < span class = "lineno" > 852< / span >   mem, {queryTileSize, pass2Chunks, k}, stream);< / div >
< div class = "line" > < a name = "l00853" > < / a > < span class = "lineno" > 853< / span >   DeviceTensor< int, 3, true> * heapIndices[2] =< / div >
< div class = "line" > < a name = "l00854" > < / a > < span class = "lineno" > 854< / span >   {& heapIndices1, & heapIndices2};< / div >
< div class = "line" > < a name = "l00855" > < / a > < span class = "lineno" > 855< / span >   < / div >
< div class = "line" > < a name = "l00856" > < / a > < span class = "lineno" > 856< / span >   < span class = "keyword" > auto< / span > streams = res-> getAlternateStreamsCurrentDevice();< / div >
< div class = "line" > < a name = "l00857" > < / a > < span class = "lineno" > 857< / span >   streamWait(streams, {stream});< / div >
< div class = "line" > < a name = "l00858" > < / a > < span class = "lineno" > 858< / span >   < / div >
< div class = "line" > < a name = "l00859" > < / a > < span class = "lineno" > 859< / span >   < span class = "keywordtype" > int< / span > curStream = 0;< / div >
< div class = "line" > < a name = "l00860" > < / a > < span class = "lineno" > 860< / span >   < / div >
< div class = "line" > < a name = "l00861" > < / a > < span class = "lineno" > 861< / span >   < span class = "keywordflow" > for< / span > (< span class = "keywordtype" > int< / span > query = 0; query < queries.getSize(0); query += queryTileSize) {< / div >
< div class = "line" > < a name = "l00862" > < / a > < span class = "lineno" > 862< / span >   < span class = "keywordtype" > int< / span > numQueriesInTile =< / div >
< div class = "line" > < a name = "l00863" > < / a > < span class = "lineno" > 863< / span >   std::min(queryTileSize, queries.getSize(0) - query);< / div >
< div class = "line" > < a name = "l00864" > < / a > < span class = "lineno" > 864< / span >   < / div >
< div class = "line" > < a name = "l00865" > < / a > < span class = "lineno" > 865< / span >   < span class = "keyword" > auto< / span > prefixSumOffsetsView =< / div >
< div class = "line" > < a name = "l00866" > < / a > < span class = "lineno" > 866< / span >   prefixSumOffsets[curStream]-> narrowOutermost(0, numQueriesInTile);< / div >
< div class = "line" > < a name = "l00867" > < / a > < span class = "lineno" > 867< / span >   < / div >
< div class = "line" > < a name = "l00868" > < / a > < span class = "lineno" > 868< / span >   < span class = "keyword" > auto< / span > listIdsView =< / div >
< div class = "line" > < a name = "l00869" > < / a > < span class = "lineno" > 869< / span >   listIds.narrowOutermost(query, numQueriesInTile);< / div >
< div class = "line" > < a name = "l00870" > < / a > < span class = "lineno" > 870< / span >   < span class = "keyword" > auto< / span > queryView =< / div >
< div class = "line" > < a name = "l00871" > < / a > < span class = "lineno" > 871< / span >   queries.narrowOutermost(query, numQueriesInTile);< / div >
< div class = "line" > < a name = "l00872" > < / a > < span class = "lineno" > 872< / span >   < / div >
< div class = "line" > < a name = "l00873" > < / a > < span class = "lineno" > 873< / span >   < span class = "keyword" > auto< / span > heapDistancesView =< / div >
< div class = "line" > < a name = "l00874" > < / a > < span class = "lineno" > 874< / span >   heapDistances[curStream]-> narrowOutermost(0, numQueriesInTile);< / div >
< div class = "line" > < a name = "l00875" > < / a > < span class = "lineno" > 875< / span >   < span class = "keyword" > auto< / span > heapIndicesView =< / div >
< div class = "line" > < a name = "l00876" > < / a > < span class = "lineno" > 876< / span >   heapIndices[curStream]-> narrowOutermost(0, numQueriesInTile);< / div >
< div class = "line" > < a name = "l00877" > < / a > < span class = "lineno" > 877< / span >   < / div >
< div class = "line" > < a name = "l00878" > < / a > < span class = "lineno" > 878< / span >   < span class = "keyword" > auto< / span > outDistanceView =< / div >
< div class = "line" > < a name = "l00879" > < / a > < span class = "lineno" > 879< / span >   outDistances.narrowOutermost(query, numQueriesInTile);< / div >
< div class = "line" > < a name = "l00880" > < / a > < span class = "lineno" > 880< / span >   < span class = "keyword" > auto< / span > outIndicesView =< / div >
< div class = "line" > < a name = "l00881" > < / a > < span class = "lineno" > 881< / span >   outIndices.narrowOutermost(query, numQueriesInTile);< / div >
< div class = "line" > < a name = "l00882" > < / a > < span class = "lineno" > 882< / span >   < / div >
< div class = "line" > < a name = "l00883" > < / a > < span class = "lineno" > 883< / span >   runIVFFlatScanTile(queryView,< / div >
< div class = "line" > < a name = "l00884" > < / a > < span class = "lineno" > 884< / span >   listIdsView,< / div >
< div class = "line" > < a name = "l00885" > < / a > < span class = "lineno" > 885< / span >   listData,< / div >
< div class = "line" > < a name = "l00886" > < / a > < span class = "lineno" > 886< / span >   listIndices,< / div >
< div class = "line" > < a name = "l00887" > < / a > < span class = "lineno" > 887< / span >   indicesOptions,< / div >
< div class = "line" > < a name = "l00888" > < / a > < span class = "lineno" > 888< / span >   listLengths,< / div >
< div class = "line" > < a name = "l00889" > < / a > < span class = "lineno" > 889< / span >   *thrustMem[curStream],< / div >
< div class = "line" > < a name = "l00890" > < / a > < span class = "lineno" > 890< / span >   prefixSumOffsetsView,< / div >
< div class = "line" > < a name = "l00891" > < / a > < span class = "lineno" > 891< / span >   *allDistances[curStream],< / div >
< div class = "line" > < a name = "l00892" > < / a > < span class = "lineno" > 892< / span >   heapDistancesView,< / div >
< div class = "line" > < a name = "l00893" > < / a > < span class = "lineno" > 893< / span >   heapIndicesView,< / div >
< div class = "line" > < a name = "l00894" > < / a > < span class = "lineno" > 894< / span >   k,< / div >
< div class = "line" > < a name = "l00895" > < / a > < span class = "lineno" > 895< / span >   l2Distance,< / div >
< div class = "line" > < a name = "l00896" > < / a > < span class = "lineno" > 896< / span >   useFloat16,< / div >
< div class = "line" > < a name = "l00897" > < / a > < span class = "lineno" > 897< / span >   outDistanceView,< / div >
< div class = "line" > < a name = "l00898" > < / a > < span class = "lineno" > 898< / span >   outIndicesView,< / div >
< div class = "line" > < a name = "l00899" > < / a > < span class = "lineno" > 899< / span >   streams[curStream]);< / div >
< div class = "line" > < a name = "l00900" > < / a > < span class = "lineno" > 900< / span >   < / div >
< div class = "line" > < a name = "l00901" > < / a > < span class = "lineno" > 901< / span >   curStream = (curStream + 1) % 2;< / div >
< div class = "line" > < a name = "l00902" > < / a > < span class = "lineno" > 902< / span >   }< / div >
< div class = "line" > < a name = "l00903" > < / a > < span class = "lineno" > 903< / span >   < / div >
< div class = "line" > < a name = "l00904" > < / a > < span class = "lineno" > 904< / span >   streamWait({stream}, streams);< / div >
< div class = "line" > < a name = "l00905" > < / a > < span class = "lineno" > 905< / span >   }< / div >
< div class = "line" > < a name = "l00906" > < / a > < span class = "lineno" > 906< / span >   < / div >
< div class = "line" > < a name = "l00907" > < / a > < span class = "lineno" > 907< / span >   } } < span class = "comment" > // namespace< / span > < / div >
2018-01-09 22:44:43 +08:00
< div class = "ttc" id = "structfaiss_1_1gpu_1_1LoadStore_html" > < div class = "ttname" > < a href = "structfaiss_1_1gpu_1_1LoadStore.html" > faiss::gpu::LoadStore< / a > < / div > < div class = "ttdef" > < b > Definition:< / b > < a href = "LoadStoreOperators_8cuh_source.html#l00030" > LoadStoreOperators.cuh:30< / a > < / div > < / div >
2017-06-21 21:54:28 +08:00
< div class = "ttc" id = "structfaiss_1_1gpu_1_1Math_html_a4b17f0b5d014f300e76dde5b24af8014" > < div class = "ttname" > < a href = "structfaiss_1_1gpu_1_1Math.html#a4b17f0b5d014f300e76dde5b24af8014" > faiss::gpu::Math::reduceAdd< / a > < / div > < div class = "ttdeci" > static __device__ T reduceAdd(T v)< / div > < div class = "ttdoc" > For a vector type, this is a horizontal add, returning sum(v_i) < / div > < div class = "ttdef" > < b > Definition:< / b > < a href = "MathOperators_8cuh_source.html#l00044" > MathOperators.cuh:44< / a > < / div > < / div >
< div class = "ttc" id = "structfaiss_1_1gpu_1_1ConvertTo_html" > < div class = "ttname" > < a href = "structfaiss_1_1gpu_1_1ConvertTo.html" > faiss::gpu::ConvertTo< / a > < / div > < div class = "ttdef" > < b > Definition:< / b > < a href = "ConversionOperators_8cuh_source.html#l00023" > ConversionOperators.cuh:23< / a > < / div > < / div >
< div class = "ttc" id = "structfaiss_1_1gpu_1_1IVFFlatScan_html" > < div class = "ttname" > < a href = "structfaiss_1_1gpu_1_1IVFFlatScan.html" > faiss::gpu::IVFFlatScan< / a > < / div > < div class = "ttdoc" > The class that we use to provide scan specializations. < / div > < div class = "ttdef" > < b > Definition:< / b > < a href = "IVFFlatScan_8cu_source.html#l00046" > IVFFlatScan.cu:46< / a > < / div > < / div >
2017-02-23 06:26:44 +08:00
< / div > <!-- fragment --> < / div > <!-- contents -->
<!-- start footer part -->
< hr class = "footer" / > < address class = "footer" > < small >
Generated by   < a href = "http://www.doxygen.org/index.html" >
< img class = "footer" src = "doxygen.png" alt = "doxygen" / >
< / a > 1.8.5
< / small > < / address >
< / body >
< / html >