Add VmHeapArray class to reduce the pressure on VA reservation, since
multiple memory pools can be active at the same time.
[ROCm/clr commit: e974f7fde1]
Also removes asserts in cooperative groups shfl functions since
__hip_bfloat16 shfl is present now
Change-Id: I57578b6e68dccc10c2ddcd194e9cc18bc7732ce1
[ROCm/clr commit: 376f23b86a]
Needs further debugging but for now can test the change
Need to verify if this fixes all the below issues-
SWDEV-512754, SWDEV-511675, SWDEV-511055, SWDEV-504085, SWDEV-499503
Also verify original issues
SWDEV-471863, SWDEV-490991
Change-Id: Ic845f851de1b98e8ed9aa0f07afddec3858119e9
[ROCm/clr commit: f1b8ee7b7f]
- For D2H cases avoid passing dependent signals to SDMA, the signals
take a while to resolve on SDMA engine
Change-Id: I569635228af977847f201c82ca897002f8f2f4a8
[ROCm/clr commit: 78d0ff2dbc]
This reverts commit f674ba58f0.
Reason for revert: 6.4 Preview changes need not be merged to amd-staging as of now
Change-Id: I86452adfed14655f72d90440a486089743cc6587
[ROCm/clr commit: 5da8ce45ab]
This reverts commit 82f78ce187.
Reason for revert: 6.4 Preview changes need not be merged to amd-staging as of now
Change-Id: Ifba0c8a248bc40deaa9c59b7f2901531300e5ea4
[ROCm/clr commit: 4206405514]
This reverts commit 04dc7ca51f.
Reason for revert: 6.4 Preview changes need not be merged to amd-staging as of now
Change-Id: I04af8603053338f08c396e78ff8a6715e641ca19
[ROCm/clr commit: 3fa6049c46]
Using target_link_libraries does not properly link the hipRTC-header.o
into libhiprtc for static build. Change to use target_sources instead.
This does not affect the linkage in the shared build.
Change-Id: I626f9eacc1637b792a50e7ddddb5db09e704ac4a
[ROCm/clr commit: 8f54aeb765]
Also part of SWDEV-510994.
1. Fix atomicMin/Max_system() for float and double.
2. Remove logics of gfx941 which isn't supported.
Change-Id: Iacfdc1bc13e8da2f5df8751bb315b37d33cea667
[ROCm/clr commit: d91e1f19d0]
- hipStreamWaitEvent may not resolve streams
- Correct usage of flag passed to streamWait function
Change-Id: I2ee163615d303b98937c1035d60da283cce6f677
[ROCm/clr commit: 940347ad42]
- This change tries to save extra synchronization packets we may insert
as we didnt track the completion signals for every command. We track
the current enqueued command until it exits the enqueue stage. We also
record the exit scope to know if we flushed the caches
- Handle correct release scopes and store completion signal as HW events
- Use a new finishCommand implementation to only wait for the command
passed as the argument
Change-Id: Ie4350c5dd24f5d48dfa6ccbabd892f0544caadcc
[ROCm/clr commit: e03e4f3b5d]
Since hipMemMap can be called for multiple device handles on the same virtual memory, the same is true for hipMemUnmap, meaning that virtual memory can be "partially unmapped".
This means that the unmap function can be called for a specific part of the reserved address, meaning that only the designated subbuffer should be released. If unmap is called on the entire reserved memory, then all subbuffers should be released.
The main point is that for every hsa_amd_vmem_map, there should be a corresponding hsa_amd_vmem_unmap. Otherwise, if entire memory is unmapped by a single unmap call, then HSA will report the memory as "in use" if an attempt is made to delete it.
Change-Id: I039308eafb820decfb1c09f60347f26cdad1a362
[ROCm/clr commit: 3ec1d2d2f1]
- Use getBuffer/releaseBuffer in BlitManager
- Cleanup XferBuffer as we use ManagedBuffer for both reads/writes
Change-Id: I2661b85dd012763b17a38a743fec1b1d79125f67
[ROCm/clr commit: 37d606d193]
- If any kernel uses device heap, the launch needs to be preceeded by an
init kernel, Save on the extra barrier packet launch/flush between the
init heap kernel and user kernel
Change-Id: I8ebc6246188200e5f673dc464bc76a53bcb8b7c6
[ROCm/clr commit: ca530c660b]