After setting the new params in hipDrvGraphExecMemsetNodeSetParams, we
need to update the AQL packet as well, otherwise during the graph launch
it still dispatches the packet which has the original params and not the
updated one.
Change-Id: Ie49a641ba3f66c8085a29f92d88ac6ea6a1c0534
[ROCm/clr commit: ba2ebb3b99]
for HIP, Update should be only if compiler notifies use of stack size.
Change-Id: Ic781bcac6fcf586da39ec4aafd4809da3652ede3
[ROCm/clr commit: 4aa52155ee]
* When hipMemset3dAsync is captured, a 3d extent can set be as a parameter (depth > 1). That worked on nvidia, but on amd wrong portion of array was filled because when creating Memset3D command, extent dimensions were used to create pitchedPtr, instead of original array width and height.
* Also, when capturing hipMemset3dAsync, nvidia allows any of the extent dimension to be 0, and in that case, no work should be done.
Change-Id: I46a605bf9ae801cd3348e98d528c21263a8eefce
[ROCm/clr commit: ec60bb1aed]
1. Fix LDSSize type to be uint32_t.
2. Prevent clWaitForEvents running on complete events whose
HostQueue have been destructed.
Change-Id: I829e915f56b37db2ba76bb876c9656166534f154
[ROCm/clr commit: 82dff9a67d]
- Create bins each with its own map and lock. This would help cases
where the hash of a VA is differnet than ther one which falls in
different bin, and there is no lock contention
- Use STL shared mutexes, that way we can unique_lock for map updates
vs simple reads which can use shared_lock
Change-Id: I118818be65c6373700f5e511045babb6a398938a
[ROCm/clr commit: e23ff0520b]
Add an atomic counter to track the outstanding HSA handlers.
Wait on CPU for the callbacks if the number exceeds the value
in DEBUG_HIP_BLOCK_SYNC env variable.
Change-Id: I95dc8c4bf0258c7e59411b7504220709ed6898c5
[ROCm/clr commit: 403f624bf8]
=> GraphExec instance is destroyed before async launch completes,
destroy after all pending graph launches
=> Remove GraphExec destroy during next sync point(hipStreamSync,
hipDeviceSync etc..)
Change-Id: I4df682aae5787fd6e5240a7be936ce50361345d0
[ROCm/clr commit: f9f995c6d0]
Windows alings fields to 8 bytes even with 32bit builds.
Add BUG_CLR_SYSMEM_POOL to cotnrol sysmempool.
Change-Id: I8622aabc9f7391ed7dd8583b252ce9eb41d62293
[ROCm/clr commit: 6bb7d1afdc]
- Remove the list of all chunks and use embedded chunk
information in each allocation. That simplifies Free() logic,
avoiding expensive loop if for some reason the number of
outstanding allocations significantly grew.
Change-Id: I9ea84d314320ce356ed24dd3180f262e2116c59b
[ROCm/clr commit: ad18146d8f]
1) SW Conversions for ocp and fnuz are enabled on pre mi300 archs
2) for mi300 only fnuz is enabled
3) for gfx1200 only ocp is enabled
Change-Id: I90373752a2d15eff20d5deec874ed396ba4e1788
[ROCm/clr commit: e729f08704]
Applications may submit commands withoout waits
for GPU. That causes a growth of SW unreleased commands.
Make sure runtime flushes SW queue, if it grows over some
threshold, controlled by DEBUG_CLR_MAX_BATCH_SIZE.
Change-Id: Ia4d85c24210ef91c394f638ab6b53b14323a0396
[ROCm/clr commit: 8657a77029]
Since we don't distribute icd loader, we need to install distro icd loader.
Change-Id: I1ea86bcf7c642a034c53f71130b15de1fa27e31e
[ROCm/clr commit: df9ae754a4]
- Don't generate callbacks for HIP events
- Don't process profiling info in the callback for HIP events
- Wait for CPU status update of the submitted commands
every 50 calls. That will allow to drain the commands and
destroy HSA signals.
Change-Id: Ib601a350e7e7c2b6c6209a172385389baccf73a9
[ROCm/clr commit: 364dfb0ed1]
Changed the validation to occur on the sub-object rather than the parent.
Change-Id: I87bf5ef3526d0db9304099ef9ac1a5494e9a01a9
[ROCm/clr commit: 5da72f9d52]
- Use AMD_LOG_LEVEL_SIZE in MBs to set log file size truncation, by default its 2048 MB
Change-Id: Ia2f87e8c6b94148e30edfb602b279f93630817c3
[ROCm/clr commit: 35e03ea0d0]
PAL supports allocating from system memory once device memory is used up
or allocation is larger than the device memory.
Change-Id: Iccd3377e95a6cc6d23e45d4738a17af8b9ee32d7
[ROCm/clr commit: b07178618c]