When OCL ROCr backend performs CL_MEM_COPY_HOST_PTR it may attempt
to have access to amd::Memory object it's currently creating,
but it's not ready yet. The logic creates a temporary dummy object
to perform a copy transfer. The new change will make sure runtime
skips allocation of the same device::Memory object second time.
Change-Id: I14c6a00a3941fdcaa6aea299e9f096e4c3f5cadf
ROCr is now reporting the actual HW addressing limits for HIP, so OpenCL will have to impose lower limit.
Change-Id: I60c2ce27ed1d1f45f16fb76438965a236ba872c6
The change reuses HSA signals for dispatches as a wait signal.
Skipping the barrier requires to disable L2 cache for sysmem
allocations and extra tracking for HDP access with the large bar.
ROC_BARRIER_SYNC=0 activates the new logic. Barrier sync is
still used by default.
ROC_ACTIVE_WAIT=1 enables unconditional active wait in ROCr.
The change also consolidated ROCr wait logic under single function.
Change-Id: I6bd1be30aa88258da1b1f9de319ef5a45852afd8
SWDEV-249719 - root cause: queues with custom CU mask are not inserted
into queuePool_ (i.e., queue of reusable HSA queues) of ROC device class
causing a crash when creating hostcall buffers for printf
Change-Id: Ieee7005d9a5a30b3113394ce23ee65927126d0d6
root cause - cooperative queue is not inserted into queuePool_ (HSA queues) of ROC device calss causing a crash when creating hostcall buffers for printf
Change-Id: I3f9aceb4e5fe6a7c7a2a549a4bb0a3511fe02799
Fix a typo with the name define, when compilation wasn't enabled.
Force CPU prefetch if system was forced in runtime
Change-Id: Id4b578f9fa44a45426fdb5d8ecb1da803aa42313
When HIP_ENABLE_DEFERRED_LOADING=0, many global variables will be
referenced but they are not initialized in that early time. The patch
will use constexpr to initialze global constant varables in compile
time.
Change-Id: I9d538b7abc6a0ce700ec3332b97fc144db5fc1ef
If numa lib is in building system, define ROCCLR_NUMA_SUPPORT to
support numa; otherwise, don't support numa.
Change-Id: I3848d7fdec5a3813ff1edad9b71ff04372dc0b9a
Rocr won't return real hops so that we have to deduce hops from
numa distance as a workaround. This will be subject to change
as driver team will provide a long term solution in rocm3.7
Change-Id: Ifb939ed848db190c3d544bb7f30a5821161921e6
- Expose ROCclr interfaces for HIP usage
- ROCr interfaces aren't available in staging, thus control the
build with AMD_HMM_SUPPORT define
Change-Id: Iadc2bcc230e78d3b0dc22b235189c8cc80843446
If a offset of the pointer is passed to free it may release
the mem object but may not release from MemObjMap. Erase the map
by getting the parent pointer.
Change-Id: I06b92548de2d49b4029efe6b511329225007cc55
Select cpu in terms of the smallest Numa distance for a GPU device.
This will improve performance of hipMemcpy in the mode of
hipMemcpyHostToDevice or hipMemcpyDeviceToHost for small buffer.
`
Change-Id: I2860f1f83b79be0dff7bf5e64cf68ab4448db0a1