SWDEV-536360 - fix bullet points in reduce sync operations section not being displayed on different lines in the browser (#1346)

This commit is contained in:
Gerardo Hernandez
2025-10-14 22:02:34 +01:00
committato da GitHub
parent dde482d224
commit bfbc48bb0e
@@ -998,8 +998,9 @@ Arithmetic reduces:
T __reduce_max_sync (unsigned long long mask, T var); T __reduce_max_sync (unsigned long long mask, T var);
``T`` can be: ``T`` can be:
- On Nvidia platform: ``int`` or ``unsigned int`` * On Nvidia platform: ``int`` or ``unsigned int``
- On AMD platform: ``int`` or ``unsigned int``; if the user defines the macro ``HIP_ENABLE_EXTRA_WARP_SYNC_TYPES``, then: ``unsigned long long``, ``long long``, ``half``/``single``/``double`` precision floating
* On AMD platform: ``int`` or ``unsigned int``; if the user defines the macro ``HIP_ENABLE_EXTRA_WARP_SYNC_TYPES``, then: ``unsigned long long``, ``long long``, ``half``/``single``/``double`` precision floating
point types are also be supported. point types are also be supported.
Returns the aggregated result of the arithmetic operation, where each of the participating threads Returns the aggregated result of the arithmetic operation, where each of the participating threads
@@ -1017,8 +1018,9 @@ Logical reduces:
T __reduce_xor_sync (unsigned long long mask, T var); T __reduce_xor_sync (unsigned long long mask, T var);
``T`` can be: ``T`` can be:
- On Nvidia platform: ``unsigned int`` * On Nvidia platform: ``unsigned int``
- On AMD platform: ``unsigned int``, and if the user defines the macro ``HIP_ENABLE_EXTRA_WARP_SYNC_TYPES``, then ``int``, ``unsigned long long`` or ``long long`` are also supported
* On AMD platform: ``unsigned int``, and if the user defines the macro ``HIP_ENABLE_EXTRA_WARP_SYNC_TYPES``, then ``int``, ``unsigned long long`` or ``long long`` are also supported
Returns the result of the aggregated logical AND/OR/XOR operation where each of the participating threads Returns the result of the aggregated logical AND/OR/XOR operation where each of the participating threads
(i.e. the ones mentioned on the mask) contribute ``var``. (i.e. the ones mentioned on the mask) contribute ``var``.
@@ -1032,7 +1034,7 @@ Informational note: On the AMD platform, **masks that start from lane zero and h
exhibit better performance** than masks with "holes" (example of mask with no holes: 0xFF and with holes: 0xFB; exhibit better performance** than masks with "holes" (example of mask with no holes: 0xFF and with holes: 0xFB;
the reduction with 0xFF is faster). the reduction with 0xFF is faster).
These functiones do not provide a memory barrier on any platform. These functions do not provide a memory barrier on any platform.
Warp matrix functions Warp matrix functions
-------------------------------------------------------------------------------- --------------------------------------------------------------------------------