This reverts commit 68e5aeb93d.
Reason for revert: Breaking change that will be merged in at a later date
Change-Id: Idd300492cc08a57c50decc22df287ddcc5463c88
- Default values are being assigned causing occupancy calculation to go
wrong without the right values defined for gfx12 ASICs
- Also added the these values for gfx1105
Change-Id: I611cc3a8ed8c57f2def637310ce1c3a48c16a574
- Device can have multiple isas as per HSA spec
- First isa is most specific one, so this change is sort of a NOP
Change-Id: Ib332af21745f2e6a7c25db8986bf7717501059bc
After setting the new params in hipDrvGraphExecMemsetNodeSetParams, we
need to update the AQL packet as well, otherwise during the graph launch
it still dispatches the packet which has the original params and not the
updated one.
Change-Id: Ie49a641ba3f66c8085a29f92d88ac6ea6a1c0534
* When hipMemset3dAsync is captured, a 3d extent can set be as a parameter (depth > 1). That worked on nvidia, but on amd wrong portion of array was filled because when creating Memset3D command, extent dimensions were used to create pitchedPtr, instead of original array width and height.
* Also, when capturing hipMemset3dAsync, nvidia allows any of the extent dimension to be 0, and in that case, no work should be done.
Change-Id: I46a605bf9ae801cd3348e98d528c21263a8eefce
1. Fix LDSSize type to be uint32_t.
2. Prevent clWaitForEvents running on complete events whose
HostQueue have been destructed.
Change-Id: I829e915f56b37db2ba76bb876c9656166534f154
- Create bins each with its own map and lock. This would help cases
where the hash of a VA is differnet than ther one which falls in
different bin, and there is no lock contention
- Use STL shared mutexes, that way we can unique_lock for map updates
vs simple reads which can use shared_lock
Change-Id: I118818be65c6373700f5e511045babb6a398938a
Add an atomic counter to track the outstanding HSA handlers.
Wait on CPU for the callbacks if the number exceeds the value
in DEBUG_HIP_BLOCK_SYNC env variable.
Change-Id: I95dc8c4bf0258c7e59411b7504220709ed6898c5
=> GraphExec instance is destroyed before async launch completes,
destroy after all pending graph launches
=> Remove GraphExec destroy during next sync point(hipStreamSync,
hipDeviceSync etc..)
Change-Id: I4df682aae5787fd6e5240a7be936ce50361345d0
Windows alings fields to 8 bytes even with 32bit builds.
Add BUG_CLR_SYSMEM_POOL to cotnrol sysmempool.
Change-Id: I8622aabc9f7391ed7dd8583b252ce9eb41d62293
- Remove the list of all chunks and use embedded chunk
information in each allocation. That simplifies Free() logic,
avoiding expensive loop if for some reason the number of
outstanding allocations significantly grew.
Change-Id: I9ea84d314320ce356ed24dd3180f262e2116c59b
1) SW Conversions for ocp and fnuz are enabled on pre mi300 archs
2) for mi300 only fnuz is enabled
3) for gfx1200 only ocp is enabled
Change-Id: I90373752a2d15eff20d5deec874ed396ba4e1788
Applications may submit commands withoout waits
for GPU. That causes a growth of SW unreleased commands.
Make sure runtime flushes SW queue, if it grows over some
threshold, controlled by DEBUG_CLR_MAX_BATCH_SIZE.
Change-Id: Ia4d85c24210ef91c394f638ab6b53b14323a0396
- Don't generate callbacks for HIP events
- Don't process profiling info in the callback for HIP events
- Wait for CPU status update of the submitted commands
every 50 calls. That will allow to drain the commands and
destroy HSA signals.
Change-Id: Ib601a350e7e7c2b6c6209a172385389baccf73a9