Select cpu in terms of the smallest Numa distance for a GPU device.
This will improve performance of hipMemcpy in the mode of
hipMemcpyHostToDevice or hipMemcpyDeviceToHost for small buffer.
`
Change-Id: I2860f1f83b79be0dff7bf5e64cf68ab4448db0a1
Optimization for the fence release removed a sync for mem fill.
Add simple const buffer management forr the filled pattern to avoid
pattern overwriting with the async fills.
Change-Id: I63773ac09ceec31d5396d24570e4647ff096326b
SWDEV-234947
SWDEV-236298
Instead of forcing a barrier packet, just inject system scope on the next packet.
Change-Id: If9bcee23e08dfe5db731235e2fcb30582cbd4c1c
Eliminates most of the global include_directories. The install header
paths are different from the build directory, so we have to separate
those for the exported target include paths.
Change-Id: I13e4c56c1218cb31c29a316422dc5fd1d09d8b1b
Remove queue limitation since we loop through HW queues now.
Add a DevLogError if we fail to create the hsa_queue. A ticket showed a regression there.
Change-Id: I4f58e405f88e75600a762f6d6352838c969cdb5e
This workaround is to avoid performance penalty of SDMA engine
taking a while to clock up from a lower DPM state. Add env var
GPU_FORCE_BLIT_COPY_SIZE (1024 by default for HIP in KB). Forcing
Src and Dst agent to be amdgpu makes ROCr take blit copy path for
what otherwise should have been SDMA copy
Change-Id: I222f687155f86000d17d66d25182e490b6710463
SWDEV-232580 & SWDEV-232580
Allocate p2p statging buffer when full P2P access is not available between all devices.
p2p staging buffer will eventually be used when required.
Change-Id: If8490ba7b1c52c432c1e942ae95421b9d2ec7097
This should allow the cmake build for the opencl runtime to work
without manually adding these definitions. The PAL build also adds
these as private defines in its build, so change rocm to match. This
should probably be including these a config header to benefit other
builds, but this will at least avoid some clutter in the opencl build
for now.
Change-Id: I1044984b87ba3fc72e280e255ceea2dd9e3337ff
Don't use find_path on the header, it's redundant with the interface
include directories on the imported target. Use the target specific
forms for including and linking it.
Change-Id: I3923143c992888ee7d5ee1130084ac2e5eaa0f3a
This is almost never the correct thing to use since it breaks adding
this as a subproject build in a larger build. Switch to refer to
CMAKE_CURRENT_SOURCE_DIR, which is equivalent in a standalone build.
Change-Id: Ib8dbbc0668491f4227389b9a5b27da770b3bc5ce
[ROCm][TCT][HIP] cooperative stream test case is failing.
Make sure lockXfer() in the blit manager returns a valid value.
Port the latest PAL backend logic into the ROCr backend.
This change doesn't fix the issue, reported in the ticket.
Change-Id: I54101a824f49a2dcfbbf5414cb5b3af41745306d
- Once device assertion occurs, abort the host execution as well.
- TODO: This's the initial support. As we need to drain hostcall queue
to ensure device assertion message being flushed out, hostcall
listener needs an interface to explicitly drain its queue.
Change-Id: I8a04400aa7109bfd054ae5777c41a4abbf0db4a9
1. Enable pitch workaround
2. When we use copy image, we don't need to create the custom pitch image
3. wrtBackImageBuffer_ stores device memory object, not amd image object.
Tests:
conformance kernel read / write test pass with this code change.
Change-Id: I7dca3127adde6ac83e78dd270a2256ebed55c60d
Duplicate similar blit logic from PAL path
Tests:
1D Array image read/write tests and copy image tests passed
Change-Id: I838bbde252ad0108bfeb82c0c2b669881747c0af