- When a command may possibly have two packets(like device heap
initializer), and if there is no signal on the main kernel packet the
tracking was broken as it marked HW event of the command as the first
packet signal.
- Make sure if no completion signal is attached to the second packet
then clear the HW event for the command.
* SWDEV-518831 - fix streams' sync issue in mthreads
1. Fix sync issue of null stream and non-null streams in
multithreads.
2. Remove assert(GetSubmissionBatch() == nullptr) as it
is invalid in multithreads.
3. Update getActiveQueues() to deal with the state of
being terminated.
Fix monitor hang in cts integer_ops.
Improve notify().
Won't affect notifyAll() and Hip in direct
dispatch mode.
Change-Id: I95a458358e1cab9c76aefde117db09cdbd1fd3af
Add the new cmake option AMD_COMPUTE_WIN to build HIP on Windows
from the public github. AMD_COMPUTE_WIN should point to a special
repo with the PAL static libs
Do not use __ockl_activelane_u32() to calculate the index of the lane within the mask, as that would not work with divergent masks that have other bits on before the associated lane.
The compiler currently serializes the workgroup_processor_mode COMGR metadata boolean field as "0"/"1" instead of "false"/"true". Consider "1" a truthy value during parsing.
Also add flag: HIP_FORCE_SPIRV_CODEOBJECT to allow override to force use
SPIRV.
* use cache for already compiled code objects
* address review comments and use the two spirv isa names
* SWDEV-517078 - Maintain the trap handler ABI version in CLR
The trap handler ABI version is communicated to the debugger using
the r_version field in the r_debug structure. This structure is
an external dependency, which makes it complicated to keep the trap
handler source (in CRL) and the ABI version number (external dependency)
in sync.
This patch proposes to patch the trap handler ABI version number in
_amdgpu_r_debug before communicating it to the debugger.
We can't directly include sc's executable.hpp file in CRL as it relies
on conflicting definition of ELF related types, so instead we need to
rely on a-priori knowledge on the r_debug structure. Fortunately, this
structure is part of a stable ABI, so its layout is guaranteed to be
kept stable.
Update the 2nd level trap handler to follow updates from the
ROCr-runtime. The trap handlers are stripped from parts dedicated to
architectures unsupported by CLR.
Bump the r_debug.r_version to track the ABI changes in the trap handler.
Fix pytorch 2.5 issues, by defining reduce sync operations for type __half in amd_hip_fp16.h and not in
amd_warp_sync_functions.h which is problematic in case __half does not get included before that header.
Only define types not supported by cuda if HIP_ENABLE_EXTRA_WARP_SYNC_TYPES is defined, to avoid portability issues
Explicitly nulling the pointer causes us to report the error below
instead of keeping a dangling pointer around that will most likely lead
to a subsequent segfault.
This now has host conversions too, which is directly from Christopher's
work on fcbx.
Signed-off-by: Christopher M. Riedl
* add const to func parameter
* do not depend on builtins, use gfx950 detection
Currently, we check if there's enough system RAM even if we don't allocate on host device. This is incorrect logic.
We should not check for this size on windows because PAL checks for memory allocation. See SWDEV-467263.
Co-authored-by: Jimbo Xie <jiabaxie@amd.com>
Fix data validation issue of rocFFT when dynamic queue on.
ReleaseHwQueue() can be called only when no command in HostQueue.
The checking condition need be protected by lock.