SWDEV-536360 - fix bullet points in reduce sync operations section not being displayed on different lines in the browser (#1346)
This commit is contained in:
gecommit door
GitHub
bovenliggende
dde482d224
commit
bfbc48bb0e
@@ -998,8 +998,9 @@ Arithmetic reduces:
|
|||||||
T __reduce_max_sync (unsigned long long mask, T var);
|
T __reduce_max_sync (unsigned long long mask, T var);
|
||||||
|
|
||||||
``T`` can be:
|
``T`` can be:
|
||||||
- On Nvidia platform: ``int`` or ``unsigned int``
|
* On Nvidia platform: ``int`` or ``unsigned int``
|
||||||
- On AMD platform: ``int`` or ``unsigned int``; if the user defines the macro ``HIP_ENABLE_EXTRA_WARP_SYNC_TYPES``, then: ``unsigned long long``, ``long long``, ``half``/``single``/``double`` precision floating
|
|
||||||
|
* On AMD platform: ``int`` or ``unsigned int``; if the user defines the macro ``HIP_ENABLE_EXTRA_WARP_SYNC_TYPES``, then: ``unsigned long long``, ``long long``, ``half``/``single``/``double`` precision floating
|
||||||
point types are also be supported.
|
point types are also be supported.
|
||||||
|
|
||||||
Returns the aggregated result of the arithmetic operation, where each of the participating threads
|
Returns the aggregated result of the arithmetic operation, where each of the participating threads
|
||||||
@@ -1017,8 +1018,9 @@ Logical reduces:
|
|||||||
T __reduce_xor_sync (unsigned long long mask, T var);
|
T __reduce_xor_sync (unsigned long long mask, T var);
|
||||||
|
|
||||||
``T`` can be:
|
``T`` can be:
|
||||||
- On Nvidia platform: ``unsigned int``
|
* On Nvidia platform: ``unsigned int``
|
||||||
- On AMD platform: ``unsigned int``, and if the user defines the macro ``HIP_ENABLE_EXTRA_WARP_SYNC_TYPES``, then ``int``, ``unsigned long long`` or ``long long`` are also supported
|
|
||||||
|
* On AMD platform: ``unsigned int``, and if the user defines the macro ``HIP_ENABLE_EXTRA_WARP_SYNC_TYPES``, then ``int``, ``unsigned long long`` or ``long long`` are also supported
|
||||||
|
|
||||||
Returns the result of the aggregated logical AND/OR/XOR operation where each of the participating threads
|
Returns the result of the aggregated logical AND/OR/XOR operation where each of the participating threads
|
||||||
(i.e. the ones mentioned on the mask) contribute ``var``.
|
(i.e. the ones mentioned on the mask) contribute ``var``.
|
||||||
@@ -1032,7 +1034,7 @@ Informational note: On the AMD platform, **masks that start from lane zero and h
|
|||||||
exhibit better performance** than masks with "holes" (example of mask with no holes: 0xFF and with holes: 0xFB;
|
exhibit better performance** than masks with "holes" (example of mask with no holes: 0xFF and with holes: 0xFB;
|
||||||
the reduction with 0xFF is faster).
|
the reduction with 0xFF is faster).
|
||||||
|
|
||||||
These functiones do not provide a memory barrier on any platform.
|
These functions do not provide a memory barrier on any platform.
|
||||||
|
|
||||||
Warp matrix functions
|
Warp matrix functions
|
||||||
--------------------------------------------------------------------------------
|
--------------------------------------------------------------------------------
|
||||||
|
|||||||
Verwijs in nieuw issue
Block a user