Copying memory from device to host with a CPU agent
would cause a poor performance due to the reading of
uncahced device memory by CPU.
Fix it by using a GPU agent.
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Change-Id: Ia3b562758fe73ef9efaa284f47e67bf569cc7b7b
[ROCm/ROCR-Runtime commit: 8501c0bcb1]
The Shader Engines number should be shadder array_count divided by simd_arrays_per_engine
not array_count.
Signed-off-by: Xiaogang Chen<Xiaogang.Chen@amd.com>
Change-Id: I808d1fedd6b9843500719e902ecf759f5668a7d1
[ROCm/ROCR-Runtime commit: efcc9b275b]
ROCr internally uses the same allocation_map_ list to track memory
allocations that are both for internal allocations and allocations by
users of ROCr library. In some edge cases, the library user would call
hsa_amd_pointer_info on an invalid pointer, but ROCR would return the
pointer as valid because this pointer belongs to a memory range that
was allocated internally within ROCr. Adding a flag to differentiate
between internal and external allocations.
Change-Id: I98c52bd85f3985d1ba1b0e3101d2254b003412cf
[ROCm/ROCR-Runtime commit: 59685f4492]
Track and report the size, in bytes, of pending unexecuted blit
commands. To be used in copy ganging.
Change-Id: Ia7453ff88571e927df771c6c819b73c17e67708e
[ROCm/ROCR-Runtime commit: 27596aef0c]
KFDTopologyTest.BasicTest duplicates Thunk logic to calculate VGPR size,
meaning it will always be the same, and SGPR size is a constant. Since
no benefit, remove comparisons.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I99e7ff6fb69ed07bc0716fdf43946b19c67b9268
[ROCm/ROCR-Runtime commit: 3fb1496fb3]
Fixes hang due to change in order of initialization of libraries
that have cyclical dependencies and they call hsa_init() during their
initialization phase.
This implementation looks for a symbol called "HSA_AMD_TOOL_PRIORITY"
across all loaded shared libraries using dynamic section entries of the
loaded lib instead of using dlopen and dlsym for the same purpose.
Change-Id: I4865f2fd18dd186ec311a432ec38fbb5583805d2
[ROCm/ROCR-Runtime commit: 8aac885318]
Fixed VGPR memory size, size was too small for some GPU, causing a memory overflow.
Refactored macro code into a function.
Thanks to Jay Cornwall for locating the problem and proposing the fix.
Change-Id: Iffedea1c4f341967f02c56d810ff048225b02c16
Signed-off-by: David Belanger <david.belanger@amd.com>
[ROCm/ROCR-Runtime commit: a847a7b80e]
Reporting whether IOMMU V2 is supported.
IOMMU V1 support is not relevant to user, so not reporting it.
Change-Id: I77389484a87a352da9c2f7b2a5d9de264f90ee53
[ROCm/ROCR-Runtime commit: e30be76f37]
Currently, Wavefront::GetInfo(HSA_WAVEFRONT_INFO_SIZE.. always returns
64. Instead, return the proper wavefront size based on the ISA.
Temporarily, we only return 1 wavefront size for each ISA. As we do not
have mechanism from upper layers to determine correct wavefront when
there are multiple wavefronts supported. We are temporarily
returning 32 for all gfx1xxx cards even though they support 64 as the
kernels for gfx1xxx are compiled for wavefront-32 by default.
Change-Id: Ic6c2917b7e6d3704daf742d243f5ec7f49430de9
[ROCm/ROCR-Runtime commit: f7e3782b42]
This is a temporary work around for GPU hang issues observed on GFX11.
Change-Id: I98fbedbbd1c51fe402c2116b35ca548931a390c9
Signed-off-by: David Belanger <david.belanger@amd.com>
[ROCm/ROCR-Runtime commit: b25867c4b8]
This reverts commit 993b1dee7e.
Reason for revert: is blocked due to new proposal. so reverting the changes
Change-Id: Id9b8cc1560ba3eea6e484e67df3fdc647da9f37d
[ROCm/ROCR-Runtime commit: dbf8905dd1]
This reverts commit 1bb6d872ac.
It causes a regression in pytorch benchmark.
Change-Id: I96173dbd061cf38d6f451c02cb181ae51b7f625e
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
[ROCm/ROCR-Runtime commit: 505287412f]
Temporarily force rocrtsts to use Code Object V4 while compiler team is
about to switch the default Code Object to V5. Will switch back to using
default compiler setting once everything is tested/fixed.
Change-Id: I18e5c6771fffd8c60792fc197501d373c7ec22f3
[ROCm/ROCR-Runtime commit: 0f2fa3ba72]
libelf1 package contains libelf.so.1. Updated the package name
Improvement: Removed the initialization of cmake_install_libdir in source code
Build scripts is initializing the variable to "lib" and passed as build argument
Change-Id: I16a8cdc4c231487410c1114b818e9d01df4854de
[ROCm/ROCR-Runtime commit: 5c90c762f9]
This reverts commit ea19fbb646.
There are some openMP issues that were introduced after SVM userptr
feature was added.
Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: I7ef87c5232a3bcbe594c743fa4b4958601845ba5
[ROCm/ROCR-Runtime commit: f2bda56d04]
This reverts commit a89bcd0518.
There are some openMP issues that were introduced after SVM userptr
feature was added.
Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: I6566c9f0d39d05ecb92f38159880763f432939a5
[ROCm/ROCR-Runtime commit: d9f86ae02b]
This reverts commit 6789a0f3bd.
There are some openMP issues that were introduced after SVM userptr
feature was added.
Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: Ib01046571d2c84fa0fd228ecba0dee0eae3f994d
[ROCm/ROCR-Runtime commit: 21e95a4f2a]
Add two new agent info fields:
HSA_AMD_AGENT_INFO_UCODE_VERSION
HSA_AMD_AGENT_INFO_SDMA_UCODE_VERSION
Change-Id: I51cb853724b23a26e945e5c1ac32c16d0cb3bc31
[ROCm/ROCR-Runtime commit: ecdebef0b9]
Modified If condition checks in GElfImage::pullElf() of amd_elf_image.cpp to
check using section types instead of a string check.
Change-Id: I1ab92f0a9118fb2382652a1cc900a3150cbee2da
[ROCm/ROCR-Runtime commit: 5727a10a1b]
Thunk keeps an internal cache of system topology that can be used to
speed up subsequent calls to hsaKmtAcquireSystemProperties(). This cache
is cleared by calling hsaKmtReleaseSystemProperties() at the beginning
of BuildTopology().
hsaKmtRuntimeEnable() also calls hsaKmtAcquireSystemProperties() inside
Thunk. Move call to hsaKmtRuntimeEnable() after BuildTopology() so that
we can re-use Thunks internal cache.
Parsing of of topology can take ~150 ms on systems for large number of
nodes.
Change-Id: I741709d49d67d244f5fbd707fe8f01ab923bb153
[ROCm/ROCR-Runtime commit: e39ad34d9c]
Track Test Status in syslog, it will help understand
sys log assoicated with test cases.
Change-Id: I7c0749102db9bc73d6ae3a237ec347a8fefb12e9
Signed-off-by: James Zhu <James.Zhu@amd.com>
[ROCm/ROCR-Runtime commit: 7db29c4797]
This is handled by __fmm_release calling aperture_release_area.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: Ib8ed300e1734f03aeb9dfc8074897ece310b8af9
[ROCm/ROCR-Runtime commit: 7787a039bd]
Use a common helper for CPU mappings to reduce duplicate code.
Consistently use MAP_SHARED for all render_fd mappings.
Remove double-mapping for AQL queue buffers on the CPU. This workaround
is only needed on the GPU.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: Iff86c8cc9f1e5c982614b3f11129bc2cf8cbba02
[ROCm/ROCR-Runtime commit: 73b0fb3d7c]
The NULL pointer check was the only way for that function to fail. And it
was done after the pointer was accessed. Simplify this by just returning
the result as a return value instead of using a pointer as output
parameter. This way the function can never fail and the caller doesn't
need to do any error handling.
Declare the function in libhsakmt.h instead of duplicating the
declaration in fmm.c.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: I91b90d66166fd3b5cdc47c73a9bbc369c45b51fe
[ROCm/ROCR-Runtime commit: 2d53430ce3]
Setting this variable to '0' will force to disable memory
registration/allocation through SVM API mechanism.
Not setting this or setting to '1', SVM API will be used only if all
GPUs support it.
Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: Icdf7656de09aa9988b567ec6c024953398e9bb48
[ROCm/ROCR-Runtime commit: 8a746bdaed]
Detect under-reporting of available memory by initially attempting to
allocate substantially more than reported available memory, and ensure
that the allocation fails. Continue shrinking the attempted allocation
until it succeeds, then fail the test if the successful allocation is
either too much more than or too much less than reported available.
Signed-off-by: Daniel Phillips <daniel.phillips@amd.com>
Change-Id: Ib418f0aa26e8db80590a6c5f2578da56a4b60f2b
[ROCm/ROCR-Runtime commit: e71eb13784]
When is hsaKmtCreateQueue called first time for node
doorbells[NodeId].size is initialized to zero in init_process_doorbells
but used to calculate the doorbell offset. It works just by accident
because doorbells[NodeId].size is uint32_t so -1 will be 0xFFFFFFFF which
is zero extended into 0x00000000FFFFFFFF and it will work as long as mmap
offset bits are not within lower 32 bits.
Bug: https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/issues/78
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: Ia791adfc51363d4704cb50fa4f01137b7dd48a75
[ROCm/ROCR-Runtime commit: 8e69b9c70e]
Modifier scc is disabled from gfx90a's asm, so remove the
shader for gfx90a A+A and keep it for newer asics with scc
support.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Change-Id: Iec3c7ccd5156a855adb2b02feb3db0761876aa2f
[ROCm/ROCR-Runtime commit: 8e8aa024fd]
File reorganization feature was implemented with backward compatibility
The backward compatibility support will be deprecated in future release.
Changed the #pragma message to #warning for a smooth transition
Change-Id: I21025f4cefb40721f095130263b4247877979d36
[ROCm/ROCR-Runtime commit: 01fd84db5e]
Simplified the callback method. Also fixed the way, loaded shared object were getting appended into a string vector,
which was not being passed to this callback method.
Change-Id: I68661dd73f61a11c42fa92f670e8e7b6ffcb5711
[ROCm/ROCR-Runtime commit: 8751e65b79]
File reorganization feature was implemented with backward compatibility
The backward compatibility support will be deprecated in future release.
Changed the #pragma message to #warning for a smooth transition
Change-Id: Ibaedc1873bc764d25f74d9ca9416077d084e332d
[ROCm/ROCR-Runtime commit: a34804ed3e]
When hsa is closed, it would close open fds for /dev/kfd but
not for /dev/dri/renderD*. This caused issues with CRIU
checkpoint, which expects that /dev/kfd will be open if
/dev/dri/renderD* is.
As a workaround for the CRIU behaviour, leave /dev/kfd open
when closing hsa.
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: Ie1b2d5b1d8986750b0e560ae2934b7c73cff942e
[ROCm/ROCR-Runtime commit: 88934cec2c]