Copying memory from device to host with a CPU agent
would cause a poor performance due to the reading of
uncahced device memory by CPU.
Fix it by using a GPU agent.
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Change-Id: Ia3b562758fe73ef9efaa284f47e67bf569cc7b7b
[ROCm/ROCR-Runtime commit: 8501c0bcb1]
ROCr internally uses the same allocation_map_ list to track memory
allocations that are both for internal allocations and allocations by
users of ROCr library. In some edge cases, the library user would call
hsa_amd_pointer_info on an invalid pointer, but ROCR would return the
pointer as valid because this pointer belongs to a memory range that
was allocated internally within ROCr. Adding a flag to differentiate
between internal and external allocations.
Change-Id: I98c52bd85f3985d1ba1b0e3101d2254b003412cf
[ROCm/ROCR-Runtime commit: 59685f4492]
Track and report the size, in bytes, of pending unexecuted blit
commands. To be used in copy ganging.
Change-Id: Ia7453ff88571e927df771c6c819b73c17e67708e
[ROCm/ROCR-Runtime commit: 27596aef0c]
Fixes hang due to change in order of initialization of libraries
that have cyclical dependencies and they call hsa_init() during their
initialization phase.
This implementation looks for a symbol called "HSA_AMD_TOOL_PRIORITY"
across all loaded shared libraries using dynamic section entries of the
loaded lib instead of using dlopen and dlsym for the same purpose.
Change-Id: I4865f2fd18dd186ec311a432ec38fbb5583805d2
[ROCm/ROCR-Runtime commit: 8aac885318]
Reporting whether IOMMU V2 is supported.
IOMMU V1 support is not relevant to user, so not reporting it.
Change-Id: I77389484a87a352da9c2f7b2a5d9de264f90ee53
[ROCm/ROCR-Runtime commit: e30be76f37]
Currently, Wavefront::GetInfo(HSA_WAVEFRONT_INFO_SIZE.. always returns
64. Instead, return the proper wavefront size based on the ISA.
Temporarily, we only return 1 wavefront size for each ISA. As we do not
have mechanism from upper layers to determine correct wavefront when
there are multiple wavefronts supported. We are temporarily
returning 32 for all gfx1xxx cards even though they support 64 as the
kernels for gfx1xxx are compiled for wavefront-32 by default.
Change-Id: Ic6c2917b7e6d3704daf742d243f5ec7f49430de9
[ROCm/ROCR-Runtime commit: f7e3782b42]
This reverts commit 993b1dee7e.
Reason for revert: is blocked due to new proposal. so reverting the changes
Change-Id: Id9b8cc1560ba3eea6e484e67df3fdc647da9f37d
[ROCm/ROCR-Runtime commit: dbf8905dd1]
Temporarily force rocrtsts to use Code Object V4 while compiler team is
about to switch the default Code Object to V5. Will switch back to using
default compiler setting once everything is tested/fixed.
Change-Id: I18e5c6771fffd8c60792fc197501d373c7ec22f3
[ROCm/ROCR-Runtime commit: 0f2fa3ba72]
libelf1 package contains libelf.so.1. Updated the package name
Improvement: Removed the initialization of cmake_install_libdir in source code
Build scripts is initializing the variable to "lib" and passed as build argument
Change-Id: I16a8cdc4c231487410c1114b818e9d01df4854de
[ROCm/ROCR-Runtime commit: 5c90c762f9]
Add two new agent info fields:
HSA_AMD_AGENT_INFO_UCODE_VERSION
HSA_AMD_AGENT_INFO_SDMA_UCODE_VERSION
Change-Id: I51cb853724b23a26e945e5c1ac32c16d0cb3bc31
[ROCm/ROCR-Runtime commit: ecdebef0b9]
Modified If condition checks in GElfImage::pullElf() of amd_elf_image.cpp to
check using section types instead of a string check.
Change-Id: I1ab92f0a9118fb2382652a1cc900a3150cbee2da
[ROCm/ROCR-Runtime commit: 5727a10a1b]
Thunk keeps an internal cache of system topology that can be used to
speed up subsequent calls to hsaKmtAcquireSystemProperties(). This cache
is cleared by calling hsaKmtReleaseSystemProperties() at the beginning
of BuildTopology().
hsaKmtRuntimeEnable() also calls hsaKmtAcquireSystemProperties() inside
Thunk. Move call to hsaKmtRuntimeEnable() after BuildTopology() so that
we can re-use Thunks internal cache.
Parsing of of topology can take ~150 ms on systems for large number of
nodes.
Change-Id: I741709d49d67d244f5fbd707fe8f01ab923bb153
[ROCm/ROCR-Runtime commit: e39ad34d9c]
Simplified the callback method. Also fixed the way, loaded shared object were getting appended into a string vector,
which was not being passed to this callback method.
Change-Id: I68661dd73f61a11c42fa92f670e8e7b6ffcb5711
[ROCm/ROCR-Runtime commit: 8751e65b79]
File reorganization feature was implemented with backward compatibility
The backward compatibility support will be deprecated in future release.
Changed the #pragma message to #warning for a smooth transition
Change-Id: Ibaedc1873bc764d25f74d9ca9416077d084e332d
[ROCm/ROCR-Runtime commit: a34804ed3e]
Previous versions of HIP will call hsa_amd_ipc_memory_create with then
len aligned to granularity. Temporarily allow this so that we go not
break backward compability. Will remove this after 2 releaes
Change-Id: I6b5ac2cad5d32d62c803637cf1a2c6deebc03169
[ROCm/ROCR-Runtime commit: cb71e2d715]
MES devices need GART mappings and therefore need non-paged memory. But
using non-paged memory introduces performance regression where it can
take over 80 ms to see the signal changes if the memory is in the wrong
NUMA node. Currently, we cannot control NUMA affinity when allocating
non-paged memory. Using non-paged memory allocation only on devices that
have MES scheduler
Change-Id: Ib27fb01d75247aa4f2bb2aa4503c6af5a98afda0
[ROCm/ROCR-Runtime commit: c1e836b6ab]
Using previous method of std::thread for SVM profiler task was causing
segfaults on thread launch on RHEL 8 if libhsa-runtime library is loaded
using dlopen.
Change-Id: Ic010cd6ae9bc6e6ed0605de02b93f6aae8ed3e97
[ROCm/ROCR-Runtime commit: 0e4c7336ff]
Transient exec usage is not required for GFX11 and will result in a NULL
return of s_sendmsg_rtn if directly returned to exec_lo.
Directly fetch and mask the doorbell ID to ttmp3 for GFX11 instead.
Change-Id: Ie17ed69d68d84ab18869b1c7871a0ed0482cd661
[ROCm/ROCR-Runtime commit: f9edf73cd7]
Update rocrtst packaging to add dependency on rocm-core so that rocrtst
gets uninstalled when rocm-core package is removed
Depends-On : I1e7ed52d7eed2c190d0b5651e7ded7192d7634b5
Change-Id: I7243dd29950b93a2665720a0062816c574f0f640
[ROCm/ROCR-Runtime commit: 8225271e18]
In ubuntu, the package depends list was not showing libelf. Added the same
Change-Id: I713951bd7181f44d667561aaf437f85c6cd783b0
[ROCm/ROCR-Runtime commit: 76cf5d2edc]
If hsa_amd_agents_allow_access is called for an imported IPC handle,
ignore the request as this pointer will already have these pointers
mapped to other GPUs during IPCAttach()
Change-Id: I4bf33ed57e93b5a3ead749d4f87ab6f2750bed58
[ROCm/ROCR-Runtime commit: b4f26534eb]
If a user queries the pointer info on an invalid pointer,
hsaKmtQueryPointerInfo will return error or unknown pointer. The other
fields in HsaPointerInfo are invalid, so we do not return them to the
user.
Also removing the assert and returning unknown pointer instead. As the
assert will not trigger in release builds.
hsaKmtQueryPointerInfo may also return unknown pointer for userptrs as
they are not always tracked by thunk. Adjusting code to still treat
these pointers as valid in this case.
Change-Id: Idf5cd8b61cd532d31b072f449839d223369bb138
[ROCm/ROCR-Runtime commit: 18547173e9]
:Since all public interface libraries are present in
same folder RUNPATH/RPATH is not required in the library itself.
Application shall provide the required RPATH/RUNPATH to load all
libraries.
Change-Id: I1d1ba920bf291eb89bd1f4c0fd0cfd80c7d739bd
[ROCm/ROCR-Runtime commit: ac66865385]
Amount of memory requested by user may be aligned-up internally to
the memory pool granularity. The extra padded memory should not be
considered when validating pointers from the user. Also return the
user requested size when user queries pointer information.
Change-Id: I28b25448ea03c836b44fafdb34b7330cf6887424
[ROCm/ROCR-Runtime commit: 39632a713e]
For APU asics, the default configuration size of video memory is
relatively small, while the reserved region becomes larger in recent
generation asics, ratio of max alloc size to the pool size may below
the expected value, so adjust it.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Change-Id: I0e847c4c13e957cf6e811d3f379842619cf53370
[ROCm/ROCR-Runtime commit: f05770610c]
What we want for libdrm-amdgpu is for it to be a recommended package.
Either libdrm or libdrm-amdgpu can be used, but we recommend the latter.
Using "SUGGESTS" does not seem like a strong enough requirement, but
CPACK does not support RPM recommends. Although, it does allow
customizing the RPM SPEC file template. By generating a template, which
is done by setting:
-DCPACK_RPM_GENERATE_USER_BINARY_SPECFILE_TEMPLATE=1
This template file can be trivially modified to allow adding a line to
implement CPACK_RPM_PACKAGE_RECOMMENDS.
Fixes
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I34467b1ba878827ced9b8db74977967815732552
[ROCm/ROCR-Runtime commit: 1621936e32]
Fix Binary Search sample code as kernel symbol name has a .kd
extension.
Change-Id: Id21d2e432faa40bcd5cf343345502e823678fd0f
[ROCm/ROCR-Runtime commit: d9935e6fba]
Disable automatic dependency detection when generating rocrtst RPMs.
This was adding unnecessary dependency on libhwloc, which is now
provided with the rocrtst package.
This matches behavior for DEB packages where there is no dependency
list for rocrtst.
Change-Id: If4a93f5b4c039b2f45e9445f60f65eefe84e32eb
[ROCm/ROCR-Runtime commit: e2388f242a]