Also add flag: HIP_FORCE_SPIRV_CODEOBJECT to allow override to force use
SPIRV.
* use cache for already compiled code objects
* address review comments and use the two spirv isa names
* SWDEV-517078 - Maintain the trap handler ABI version in CLR
The trap handler ABI version is communicated to the debugger using
the r_version field in the r_debug structure. This structure is
an external dependency, which makes it complicated to keep the trap
handler source (in CRL) and the ABI version number (external dependency)
in sync.
This patch proposes to patch the trap handler ABI version number in
_amdgpu_r_debug before communicating it to the debugger.
We can't directly include sc's executable.hpp file in CRL as it relies
on conflicting definition of ELF related types, so instead we need to
rely on a-priori knowledge on the r_debug structure. Fortunately, this
structure is part of a stable ABI, so its layout is guaranteed to be
kept stable.
Update the 2nd level trap handler to follow updates from the
ROCr-runtime. The trap handlers are stripped from parts dedicated to
architectures unsupported by CLR.
Bump the r_debug.r_version to track the ABI changes in the trap handler.
Fix pytorch 2.5 issues, by defining reduce sync operations for type __half in amd_hip_fp16.h and not in
amd_warp_sync_functions.h which is problematic in case __half does not get included before that header.
Only define types not supported by cuda if HIP_ENABLE_EXTRA_WARP_SYNC_TYPES is defined, to avoid portability issues
Explicitly nulling the pointer causes us to report the error below
instead of keeping a dangling pointer around that will most likely lead
to a subsequent segfault.
This now has host conversions too, which is directly from Christopher's
work on fcbx.
Signed-off-by: Christopher M. Riedl
* add const to func parameter
* do not depend on builtins, use gfx950 detection
Currently, we check if there's enough system RAM even if we don't allocate on host device. This is incorrect logic.
We should not check for this size on windows because PAL checks for memory allocation. See SWDEV-467263.
Co-authored-by: Jimbo Xie <jiabaxie@amd.com>
Fix data validation issue of rocFFT when dynamic queue on.
ReleaseHwQueue() can be called only when no command in HostQueue.
The checking condition need be protected by lock.
Also removes asserts in cooperative groups shfl functions since
__hip_bfloat16 shfl is present now
Change-Id: I57578b6e68dccc10c2ddcd194e9cc18bc7732ce1
Needs further debugging but for now can test the change
Need to verify if this fixes all the below issues-
SWDEV-512754, SWDEV-511675, SWDEV-511055, SWDEV-504085, SWDEV-499503
Also verify original issues
SWDEV-471863, SWDEV-490991
Change-Id: Ic845f851de1b98e8ed9aa0f07afddec3858119e9
- For D2H cases avoid passing dependent signals to SDMA, the signals
take a while to resolve on SDMA engine
Change-Id: I569635228af977847f201c82ca897002f8f2f4a8
This reverts commit 57df1b348f.
Reason for revert: 6.4 Preview changes need not be merged to amd-staging as of now
Change-Id: I86452adfed14655f72d90440a486089743cc6587
This reverts commit c07468e53c.
Reason for revert: 6.4 Preview changes need not be merged to amd-staging as of now
Change-Id: Ifba0c8a248bc40deaa9c59b7f2901531300e5ea4
This reverts commit 9faaf20aae.
Reason for revert: 6.4 Preview changes need not be merged to amd-staging as of now
Change-Id: I04af8603053338f08c396e78ff8a6715e641ca19