Fix monitor hang in cts integer_ops.
Improve notify().
Won't affect notifyAll() and Hip in direct
dispatch mode.
Change-Id: I95a458358e1cab9c76aefde117db09cdbd1fd3af
[ROCm/clr commit: 78f92901d8]
Add the new cmake option AMD_COMPUTE_WIN to build HIP on Windows
from the public github. AMD_COMPUTE_WIN should point to a special
repo with the PAL static libs
[ROCm/clr commit: a3effa16f1]
Do not use __ockl_activelane_u32() to calculate the index of the lane within the mask, as that would not work with divergent masks that have other bits on before the associated lane.
[ROCm/clr commit: 1a8d766836]
The compiler currently serializes the workgroup_processor_mode COMGR metadata boolean field as "0"/"1" instead of "false"/"true". Consider "1" a truthy value during parsing.
[ROCm/clr commit: d020598a0f]
Also add flag: HIP_FORCE_SPIRV_CODEOBJECT to allow override to force use
SPIRV.
* use cache for already compiled code objects
* address review comments and use the two spirv isa names
[ROCm/clr commit: 07e57a1f0d]
* SWDEV-517078 - Maintain the trap handler ABI version in CLR
The trap handler ABI version is communicated to the debugger using
the r_version field in the r_debug structure. This structure is
an external dependency, which makes it complicated to keep the trap
handler source (in CRL) and the ABI version number (external dependency)
in sync.
This patch proposes to patch the trap handler ABI version number in
_amdgpu_r_debug before communicating it to the debugger.
We can't directly include sc's executable.hpp file in CRL as it relies
on conflicting definition of ELF related types, so instead we need to
rely on a-priori knowledge on the r_debug structure. Fortunately, this
structure is part of a stable ABI, so its layout is guaranteed to be
kept stable.
Update the 2nd level trap handler to follow updates from the
ROCr-runtime. The trap handlers are stripped from parts dedicated to
architectures unsupported by CLR.
Bump the r_debug.r_version to track the ABI changes in the trap handler.
[ROCm/clr commit: 7b72c1b786]
Fix pytorch 2.5 issues, by defining reduce sync operations for type __half in amd_hip_fp16.h and not in
amd_warp_sync_functions.h which is problematic in case __half does not get included before that header.
Only define types not supported by cuda if HIP_ENABLE_EXTRA_WARP_SYNC_TYPES is defined, to avoid portability issues
[ROCm/clr commit: 66496258b4]
Explicitly nulling the pointer causes us to report the error below
instead of keeping a dangling pointer around that will most likely lead
to a subsequent segfault.
[ROCm/clr commit: 199b0f1086]
This now has host conversions too, which is directly from Christopher's
work on fcbx.
Signed-off-by: Christopher M. Riedl
* add const to func parameter
* do not depend on builtins, use gfx950 detection
[ROCm/clr commit: 628777b73d]
Currently, we check if there's enough system RAM even if we don't allocate on host device. This is incorrect logic.
We should not check for this size on windows because PAL checks for memory allocation. See SWDEV-467263.
Co-authored-by: Jimbo Xie <jiabaxie@amd.com>
[ROCm/clr commit: 0d6e554d92]
* SWDEV-505795 - Return the same ptr from hipIpcOpenMemHandle if it is called multiple times
* Move initialization outside of if statement
[ROCm/clr commit: e91cb4f320]
Fix data validation issue of rocFFT when dynamic queue on.
ReleaseHwQueue() can be called only when no command in HostQueue.
The checking condition need be protected by lock.
[ROCm/clr commit: 18d191fd1d]
Add VmHeapArray class to reduce the pressure on VA reservation, since
multiple memory pools can be active at the same time.
[ROCm/clr commit: e974f7fde1]