- Resolve signal dependencies for barrier value packet if there are > 1
depenent signals. Barrier Value packet accounts for only 1 dep signal
- Better log
Change-Id: Ia506ad5d80b91d598f92e7b539f41756e9b4b64b
[ROCm/clr commit: 2d450e8b06]
The logic can analyze the AQL queue state and
find a failed AQL packet with the kernel's name
Change-Id: I1a478fa2c25462cd07a194784958bdf22454b897
[ROCm/clr commit: ea0b092af8]
wait() is redesigned with two pathes:
fast path: Use spinlock to wait for notify signal. If the
signal hasn't been received for some loops, go to slow path.
slow path: Use condition_variable's wait().
Improve monitor wrapper for better performance.
Fix some bugs left from name removing patch.
Change-Id: I893a8353121a25d11e37c8e631caf31cc1fc1f24
[ROCm/clr commit: f2ff56af9c]
This change fixes random segfaults in graph tests that
are seen after the change make internal callbacks non-blocking.
The callback thread that decreases the GraphExec ref count
may now run after the runtime shutdown. This can cause a segfault
because the hip::device that is accessed in GraphExec destructor
is already destroyed during runtime shutdown. This patch ensures
that the hip::device object stays alive until after the
callback thread completes.
Change-Id: I75a6ac01f27a0b2250bbd10ed389ebfb322927af
[ROCm/clr commit: 21c223f8df]
Fixes#123. find_program doesn't follow CMP0074 and thus ignores LLVM_ROOT and Clang_ROOT. This change adds LLVM_ROOT and Clang_ROOT to the search path of find_program for llvm-mc and clang in hiprtc to mimics previous add_package behaviour.
Caveat: cmake-specific variables like CMAKE_PREFIX_PATH will take precedence over paths specified with HINTS for find_program, there's no way to change the ordering unless we skip cmake-specific variables all together using NO_CMAKE_PATH and NO_CMAKE_ENVIRONMENT_PATH.
Change-Id: I1fedb60cda09744416e19b3c6e3e0c5c9045f8e7
[ROCm/clr commit: 272ef9a7bf]
thread_rank() gives thread index in a block. Limit the range to the
current warp size.
Change-Id: Ib5c9831236096485cf99ba7ab0b911a3b10de31c
[ROCm/clr commit: bd7d40a4d8]
This removes almost all uses of the deprecated
__AMDGCN_WAVEFRONT_SIZE macro, which is unavailable
when targeting SPIR-V, and adds a SPIR-V compatible
formulation of warpSize (which should end up as the
sole definition of warpSize once we remove support
for treating it as a compile time constant). It
is incomplete in that the cooperative_groups
implementation will need additional surgery.
Squashed commit of the following:
commit 6840826c3fec8516857dc4f2092d84358550f588
Author: Alex Voicu <alexandru.voicu@amd.com>
Date: Fri Dec 6 23:36:32 2024 +0000
Add deprecation warning for constexpr uses of `warpSize`.
commit a72307a7353034c2de53fd164e016967945fd0d1
Author: Alex Voicu <alexandru.voicu@amd.com>
Date: Fri Dec 6 23:12:14 2024 +0000
Prepare HIP RT for SPIR-V.
commit 5e40dd746ac4f8c93b521ef048ff9d494905ba95
Author: Alex Voicu <alexandru.voicu@amd.com>
Date: Fri Dec 6 22:46:05 2024 +0000
Revert stale change.
commit 231fe91c53dba4cabd832fc84eaa6ddb402271a0
Merge: a48905ec9 12dc02b4f
Author: Alex Voicu <alexandru.voicu@amd.com>
Date: Fri Dec 6 22:37:24 2024 +0000
Merge branch 'amd-staging' of https://github.com/ROCm/clr into amd-staging
commit a48905ec9cfe0e017cc64943195be82b530117d7
Author: Alex Voicu <alexandru.voicu@amd.com>
Date: Tue Sep 17 03:14:56 2024 +0100
Add scaffolding for SPIR-V support.
Change-Id: I2e84bbe90df58a5f9a8709b619905f04fa5b96dc
[ROCm/clr commit: dd4378611a]
- Added missing validation as graph node should not be created
if parameters are invalid
- Fix conversion of input params to graphNode params
Change-Id: I37ab04942b5fb2eb07386850cb7dbbf26f9ca967
[ROCm/clr commit: db8527f655]
If a module is loaded on one device, hipModuleGetFunction and other similar APIs should be able to run successfully from another device.
Change-Id: I96084cbd6c6dcf2a81019779a6ab1842ef2f35d1
[ROCm/clr commit: c46f843b99]
This is to avoid calling the HIP_INIT macro during the shutdown process.
Change-Id: I2e65f6e10491918a17445ee1e8ddd08286070358
[ROCm/clr commit: 5e3a29078d]
Adds UberTrace support for pre-dispatch markers and barrier begin/end markers.
Moves shared definitions out of palgpuopen.hpp into shared header
palcapturemgr.hpp.
Change-Id: I9f464c689e7ff12c54eca043fc1ad65e1836a64f
[ROCm/clr commit: 541c449ce2]
- When using shader copy, make sure to use release scope for the AQL
packet. This is a potential bug but is hidden as hipMemcpyAsync always
needs synchronization(which inserts a barrier with release scope). For
hipMemcpy we use a barrier packet to make sure its blocking. Eitherways
a barrier gets always used and hides in some ways a potential bug.
Change-Id: I57fb7f769c3179e76d712471c0905104c801d7ba
[ROCm/clr commit: c9dd95bf6c]
- Resolve stream once for event record. We should avoid calling
getStream again in addMarker
Change-Id: I78448c4f151ae10a5c8e8c248b2f4078b84191cb
[ROCm/clr commit: a22c45d635]
- When we use blit(compute) copies, two subsequent copies may read for
the same source buffer, the buffer may get modified by the host in
between and if the src buffer was allocated with non-coherent flag, the
device may simply use stale value from previous cacheline fetch. This is
a corner case.
Change-Id: I2ce261c6f6fa4e5bb608f116548e5cc711ae6f3c
[ROCm/clr commit: b63005d550]