Since the sub-buffer(virtual memory that is mapped to device memory) is associated with device memory, it should utilize the device context instead of the host context. The original implementation caused hipMemcpyPeer to not take the P2P path, as the memory object was treated as host memory.
[ROCm/clr commit: a7492c516d]
Fix incorrect edits done when porting the 2nd level trap handler from
the hsa-runtime.
Change-Id: I7bc5160be47b8f669efe05c4d194bc3c47fc0661
[ROCm/clr commit: c35e9643ec]
* SWDEV-527299 - Support HIP_POINTER_ATTRIBUTE_CONTEXT
As HIP enables UVA by default, it seems we can simply expose the context to support this feature.
[ROCm/clr commit: b434fbe2bd]
Convention is to always link against .so.* at runtime.
Having it link against .so will break on systems that package
the .so files in their dev/devel package.
This issue was found when building ROCm 6.4 for Fedora.
Commiting on behalf of GitHub user Mystro256
[ROCm/clr commit: 6b12154583]
Directly use the builtins. Use the elementwise versions since there's
no implied errno, regardless of -f[no]-math-errno.
I didn't change the cases unnecessarily casting. The bfloat and vector
cases should work directly.
[ROCm/clr commit: 1db9a7d48b]
* SWDEV-520352 - Remove HostThread and legacy monitor
Remove HostThread, semaphore and legacy monitor.
Make original logics of thread and command queue stricker.
Add more comments to make logics clearer.
Some other minor improvement.
Also part of SWDEV-458943.
[ROCm/clr commit: 96cadbc9e9]
Make sure that a newly created FatBinaryInfo is assigned to modules only after extractFatBinary has been called for the object.
[ROCm/clr commit: 1099e0a131]
- When a command may possibly have two packets(like device heap
initializer), and if there is no signal on the main kernel packet the
tracking was broken as it marked HW event of the command as the first
packet signal.
- Make sure if no completion signal is attached to the second packet
then clear the HW event for the command.
[ROCm/clr commit: 072fb0804e]
Support programmatic query and change of scratch limit on
AMD devices.
Change-Id: Id5da355a77366f97868e462847f3916e87fd2af6
[ROCm/clr commit: 1113eff3f9]
* SWDEV-518831 - fix streams' sync issue in mthreads
1. Fix sync issue of null stream and non-null streams in
multithreads.
2. Remove assert(GetSubmissionBatch() == nullptr) as it
is invalid in multithreads.
3. Update getActiveQueues() to deal with the state of
being terminated.
[ROCm/clr commit: 27aad09bd4]
Fix monitor hang in cts integer_ops.
Improve notify().
Won't affect notifyAll() and Hip in direct
dispatch mode.
Change-Id: I95a458358e1cab9c76aefde117db09cdbd1fd3af
[ROCm/clr commit: 78f92901d8]
Add the new cmake option AMD_COMPUTE_WIN to build HIP on Windows
from the public github. AMD_COMPUTE_WIN should point to a special
repo with the PAL static libs
[ROCm/clr commit: a3effa16f1]
Do not use __ockl_activelane_u32() to calculate the index of the lane within the mask, as that would not work with divergent masks that have other bits on before the associated lane.
[ROCm/clr commit: 1a8d766836]