Device binaries that are embedded inside the host binary do not
require a copy. Their lifetime is guaranteed to exceed that of the
loaded executable.
Add a 'make_copy' parameter to amd::Program::addDeviceProgram. If
make_copy is false the original image will be used and will not
get freed when the amd::Program is destroyed.
Change-Id: I7973bb0243f5a2d1b639b8a88445cfe6af919dd7
Remove queue limitation since we loop through HW queues now.
Add a DevLogError if we fail to create the hsa_queue. A ticket showed a regression there.
Change-Id: I4f58e405f88e75600a762f6d6352838c969cdb5e
This workaround is to avoid performance penalty of SDMA engine
taking a while to clock up from a lower DPM state. Add env var
GPU_FORCE_BLIT_COPY_SIZE (1024 by default for HIP in KB). Forcing
Src and Dst agent to be amdgpu makes ROCr take blit copy path for
what otherwise should have been SDMA copy
Change-Id: I222f687155f86000d17d66d25182e490b6710463
Object libraries are weird, and producing a library by using the
target objects from them doesn't automatically import the interface
properties of the linked targets. These object libraries only have
single uses, so just directly create the final library from the
sources.
Leaves libelf as an object library, since there seems to be some cmake
oddity when trying to link an unexported target to an exported one.
Change-Id: Ic379612c89340c40085c9862cfe111fa4bbff425
SWDEV-232580 & SWDEV-232580
Allocate p2p statging buffer when full P2P access is not available between all devices.
p2p staging buffer will eventually be used when required.
Change-Id: If8490ba7b1c52c432c1e942ae95421b9d2ec7097
There's a lot of unnecessary system configuration junk here which
isn't used, and is already available through compiler predefines. This
is also blindly placed without really checking the host architecture.
-DLINUX is unused.
-D__AMD64__ is predefined by the compiler, and is also redundant with
__x86_64__ and ATI_BITS_64.
__x86_64__ should also be removed. It's used in libelf, but I'm not
sure if msvc predefines this or not.
-DqLittleEndian is unused, and also doesn't follow macro naming
conventions (plus compilers have their own predefines for checking
this).
Change-Id: I89f6fc4c88e861623be7f32df41aecbb4e9009ab
This should allow the cmake build for the opencl runtime to work
without manually adding these definitions. The PAL build also adds
these as private defines in its build, so change rocm to match. This
should probably be including these a config header to benefit other
builds, but this will at least avoid some clutter in the opencl build
for now.
Change-Id: I1044984b87ba3fc72e280e255ceea2dd9e3337ff
Use target specific forms for define/include. Don't set
CMAKE_CXX_FLAGS for the standard, which is already implied from the
parent build.
Change-Id: I4000893376d6685e9889b66ad8451fc493020272
Don't use find_path on the header, it's redundant with the interface
include directories on the imported target. Use the target specific
forms for including and linking it.
Change-Id: I3923143c992888ee7d5ee1130084ac2e5eaa0f3a
This is almost never the correct thing to use since it breaks adding
this as a subproject build in a larger build. Switch to refer to
CMAKE_CURRENT_SOURCE_DIR, which is equivalent in a standalone build.
Change-Id: Ib8dbbc0668491f4227389b9a5b27da770b3bc5ce
[ROCm][TCT][HIP] cooperative stream test case is failing.
Make sure lockXfer() in the blit manager returns a valid value.
Port the latest PAL backend logic into the ROCr backend.
This change doesn't fix the issue, reported in the ticket.
Change-Id: I54101a824f49a2dcfbbf5414cb5b3af41745306d
- Once device assertion occurs, abort the host execution as well.
- TODO: This's the initial support. As we need to drain hostcall queue
to ensure device assertion message being flushed out, hostcall
listener needs an interface to explicitly drain its queue.
Change-Id: I8a04400aa7109bfd054ae5777c41a4abbf0db4a9
e.g.:
warning: expression does not compute the number of elements in this
array; element type is '__cpu_mask' (aka 'unsigned long'), not
'uint32_t' (aka 'unsigned int') [-Wsizeof-array-div]
for (uint i = 0; i < sizeof(mask_.__bits) / sizeof(uint32_t); ++i) {
__bits is a __cpu_mask, which is a 64-bit type. These were accessed
through uint32_t pointers so the loop bound should have been
correct. These operations can be done directly on the 64-bit type so
we can leave the array size pattern, and eliminate the casts.
The case in getNextSet should probably be rephrased in terms of
__cpu_mask to avoid the pointer casting, but this is tricker than the
other cases so I used the easy option to quiet the warning.
Change-Id: I1332584fad58439ccd9d369589519a9918e1678e
- Problem with CL_DEVICE_GLOBAL_FREE_MEMORY_AMD query.
Check if allocated memory exceeds the total size.
Change-Id: Ieed8829860663bac1acfa41d21309dff4d8772c7