Use env var DEBUG_CLR_KERNARG_HDP_FLUSH_WA=1 to fall back to HDP flush
workaround. The default is 0
Change-Id: I7bdb9be61da60c30d15ac9991b7cd27351e1831c
On gfx8, gfx9 devices before MI100 and gfx10.0 or gfx10.1
none of the memory ordering workarounds for device kernel arguments
can be applied. Use host kernel arguments on these devices.
Change-Id: I9be6fbfe4b3986eb7d9f83998334df5f03fd4124
The Readback and Avoid HDP Flush memory ordering workaround is
used as a fallback solution only when HDP flush register is invalid
Change-Id: Ic284eba1f95ed22b0270d3abeb904fb902015b1a
- Enable Device kernel args for MI300* for now.
- Fix a perf issue which impacts graph instantiate when dev kernel args
are enabled.
Change-Id: I962e58fd9d8dd1a8db95e601cb03a8e9c7bac97f
- Implement workaround to ensure HDP writes are done by writing and
reading the HDP MMIO register.
- Implement the same workaround for graphs, we no longer need sentinel
write/readback
Change-Id: I0d3027b46a1f61131ec62e3c8c669ff5184fa6b2
- Add the new fillBuffer kernel, which allows to launch a limited
number of workgroups for memory fill operation
- Switch fill memory to 16 bytes write by default
- Allow to limit the workgroups with DEBUG_CLR_LIMIT_BLIT_WG
Change-Id: Ibad1822f2d42b2fc71bcfc1917c31409c0623e8e
Pinned copy can cause big performance drops, because slow pinning under Windows.
Use up to 128MB for staging transfers. Change staging buffer size to 4MB.
Linux path should still have the old defaults.
Change-Id: I954edceb3ec89e8e670be116aa2d0a9564c8b11c
Add a env var ROC_USE_FGS_KERNARG to toggle kernel arg placement
By default its in Fine Grain Kernel arg segment for supported asics.
Change-Id: I3d57ed69a1a4db2b392b0438ead499f3ddca4716
- The logic will trace compute, sdma read/write operations and
apply signals when necessary
- ROC_CPU_WAIT_FOR_SIGNAL, ROC_SYSTEM_SCOPE_SIGNAL
and ROC_SKIP_COPY_SYNC were added to control the tracking
Change-Id: I9e8e6174c63bf7784f7ab00964e2918c8667d364
- Correct GSL path to report targets using the TargetID syntax.
- Correct GSL path to check compatibility of code objects when
loading.
- Add concept of an device isa and create a registery used by ROCm,
PAL and GSL.
- Support XNACK and SRAMECC target features consistently for PAL and ROCm.
- Correct logic for NullDevices and asserts to avoid memory coruption.
- Allow all NullDevices to be created for HIP.
- Numerous other code improvements.
Change-Id: I40abf3d2b22249c1492d1af5919665f8184f4e0e
The change reuses HSA signals for dispatches as a wait signal.
Skipping the barrier requires to disable L2 cache for sysmem
allocations and extra tracking for HDP access with the large bar.
ROC_BARRIER_SYNC=0 activates the new logic. Barrier sync is
still used by default.
ROC_ACTIVE_WAIT=1 enables unconditional active wait in ROCr.
The change also consolidated ROCr wait logic under single function.
Change-Id: I6bd1be30aa88258da1b1f9de319ef5a45852afd8
Fix a typo with the name define, when compilation wasn't enabled.
Force CPU prefetch if system was forced in runtime
Change-Id: Id4b578f9fa44a45426fdb5d8ecb1da803aa42313
- Expose ROCclr interfaces for HIP usage
- ROCr interfaces aren't available in staging, thus control the
build with AMD_HMM_SUPPORT define
Change-Id: Iadc2bcc230e78d3b0dc22b235189c8cc80843446