This reduces thrashing due to graphics submissions only and
significantly speeds up the BasicTest when keeping idle compute
processes evicted. In the BasicTest compute is always idle, so
only one compute eviction and no restore is triggered. Then
graphics submissions complete quickly without thrashing each other.
Change-Id: Iae6da98903b20424a5097f235e1d09cf13e4b41b
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: f474cf21cd]
1. umc error injection only accepts parameter "0 0".
2. flush output to file in order to make writing happen
immediately.
Change-Id: I8d3bde287caee6b90b6eec56c760f5a228be7595
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
[ROCm/ROCR-Runtime commit: 47d1c17592]
The path was wrong based on assumption that GPU dri render
node starts from 0, because if there is a VGA device on
board, node 0 will be VGA and node 1 will be GPU. So the fix
will look at the name of GPU minor node and find the correct
primary node on which RAS debugfs entry exists.
Change-Id: Icc5e63ce48698d5d29105c0417e3bec8afa0a7c8
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
[ROCm/ROCR-Runtime commit: d278b2579e]
Check if it is true or not. The string() call would define this to an
empty string, which would pass. This would then leave a trailing -
in the version string, which dpkg would error on during package
installation.
Change-Id: Ifb5fc15f5dde506e96bff7881a5d3f22d983406e
[ROCm/ROCR-Runtime commit: 0016c6ce5b]
API is a stateless lookup of RO data and needed to interpret
hsa_init error codes.
Change-Id: If80cba2f697843d08e529da0f790acf3c37127a7
[ROCm/ROCR-Runtime commit: 22de0e7fb9]
Search the local src directories first. If using a system
installed hsakmt, this would pick the installed hsa headers.
Change-Id: I9746d6e9db1749a130e4d93e024556754a537083
[ROCm/ROCR-Runtime commit: 22d29b55a4]
Remove the HSA_DEBUG environment variable that controlled the
creation of these mappings.
This should allow the debugger to attach to a running process and
access VRAM buffers through ptrace without having to do anything
special.
On processes that create many small VRAM mappings, this may cause
regressions due to the per-process mmap limit. However, the
sub-allocator in ROCr should consolidate most small allocations
into 2MB blocks nowadays, for good TLB efficiency. So this is
unlikely to cause problems.
Change-Id: I929da1be0f6cb51ec00a02f3f241d16083e4d95f
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: 64b90261d9]
Joined threads can not be joined more than once nor can they be detached.
Thread library wait and close allows multiple waits and separate close so
this fixes the pthread implementation.
Change-Id: I0019271a438f11ed4c6c11854011f5c4f6e16b65
[ROCm/ROCR-Runtime commit: a913549190]
The queue IDs passed over to the kernel via kfd_ioctl_dbg_trap_args->ptr
should be a list of uint32_t's. Need to convert from the passed in
64 bit HSA_QUEUEID to 32 bit uint32_t's.
Change-Id: I8718566d9f9ffc90ce0b2ecc129b10c49d73186a
Signed-off-by: Philip Cox <Philip.Cox@amd.com>
[ROCm/ROCR-Runtime commit: 608bc7c3a0]
Small times may be given to time conversion if GPU clocks are used to
accumulate elapsed time. Because HSA APIs deal in absolute time this
leads to large conversion offsets of order system uptime. Variation
in relative clock ratio estimation may be amplified in this case,
destroying elapsed time measurements.
This patch fixes the relative clock ratio used for times which predate
the call to hsa_init. This correlates errors in such times allowing
the elapsed time to be correctly computed.
The effective maximum system uptime before elapsed time conversion becomes
inaccurate is ~3.5 months. GPU event timestamps are good for process uptime
of ~3.5 months. These are limited by double's mantissa precision.
Change-Id: I48752ff354920439d91016d6f2b0c8ddfa60b445
[ROCm/ROCR-Runtime commit: 6e2a056e1b]
Exposed via agent info query. Only valid if fine grain PCIe memory is enabled.
Change-Id: Ib4770901592ec047276458926a947737f9b93bb5
[ROCm/ROCR-Runtime commit: 06376e726b]
This can cause build failures on unknown of future compiler versions.
Only enable it if explicitly enabled by an environment variable. This
allows us to continue building with -Werror in internal builds with
known compiler versions.
Change-Id: Ic1cd9d223218cc4e4cddba49df93bb357c1cbd40
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: 8f10c9375d]
There was a mistake and RESUME was used when it should
have been suspend in two places in the suspend resume
code. This fixes that error.
Change-Id: I69be733d7ae7c14ce5ee8af57a307976e4212d62
[ROCm/ROCR-Runtime commit: b0d23aee16]
This is updating to the new suspend and resume API for the
KFD and the thunk. We now support passing in a list of queues
to suspend, and not just all of the queues for the process.
The kfdtest testcase was also updated so it still compiles.
Change-Id: I71d1b178476bd9df0c311bdedaa6a891528cebcf
Signed-off-by: Philip Cox <Philip.Cox@amd.com>
[ROCm/ROCR-Runtime commit: c2c1385e29]
hsaKmtGetQueueInfo needs to return the control stack size, and the
wave state size for the debugger. These changes are needed to support
returning the new values.
Change-Id: Ib4c60e0ea34446c06aef4a86996250989f348a69
Signed-off-by: Philip Cox <Philip.Cox@amd.com>
[ROCm/ROCR-Runtime commit: d21e9d5bbd]
- Run more graphics command submissions with shorter delay between
them
- Synchronize after every graphics command submission
- Include the big VRAM BO in the BOList of the command submission
to trigger more evictions
- In QueueTest, run AMDGPU command submissions concurrently with
compute shader on the user mode queue
- Submit AMDGPU commands to GFX queue instead of compute queue to
avoid deadlocks between user-mode and kernel-mode queues on the
same pipe
- Allocate slightly less memory from KFD to avoid allocation errors
due to fragmentation or memory leaks in previous tests
- Running only two processes maximizes the number of KFD evictions
(probably because of lower chances of evicting non-KFD BOs)
Change-Id: If05d53f5fcf690b6488998a3f933f120ddaa71ee
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: c8d823eb10]
Add a MMIO_REMAP heap type and expose mmio virtual address
through HsaKmtGetNodeMemoryProperties
Change-Id: I1e585e6dfbec8fa7c85f1dda7b89b763a8e2c439
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
[ROCm/ROCR-Runtime commit: 804aa90a22]
HDP conherence registers are remapped at driver level
to an empty page in mmio space (the remapped mmio page).
This change allocate and map the remapped mmio page to
process space.
Change-Id: I89c405c41870a79c5b58eea0d8e564aa35f55182
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
[ROCm/ROCR-Runtime commit: ae111689f0]
At the moment it is not possible to build ROCr with Clang. This is
a spurious limitation. The present PR addresses it by guarding GCC
only flags and by fixing some additional warnings that Clang triggers;
one of said warnings did outline a rather interesting issue with math
being done on void*s. - AlexVlx
Void ptr arithmetic had already been fixed in amd-master branch.
Change-Id: I5ee97e20b5c40b10dd73facecabe75f02ba46462
[ROCm/ROCR-Runtime commit: e89f9807f1]
Non-paged memory can be IPC-shared even when HSA_USERPTR_FOR_PAGED_MEM
is enabled.
Change-Id: I8b1fa6d7a4a9327c78a77b3679697fbf55397093
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: 0c6b9532d4]
Those two tests cover the basic queue creation and destruction
without submitting packets to CP and SDMA user queues. During bringup,
they bring values in term of untangling the issues arising in queue
creation and packet execution, which are two very different kinds.
Because of those two tests, we also rename some existing tests as
follows:
CreateCpQueue -> SubmitPacketCpQueue
CreateSdmaQueue -> SubmitPacketSdmaQueue
CreateMultipleCpQueues -> MultipleCpQueues
CreateMultipleSdmaQueues -> MultipleSdmaQueues
Lastly, move MultipleCpQueues test closer to the CP queue section
rather than leaving it behind the SDMA queue section.
Change-Id: I110fb3f3fb21878339045dd1d1c8c9d61b8988b7
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
[ROCm/ROCR-Runtime commit: 5b18614eaf]
If a Render Node can't be found, we should finish off all child
processes immediately, then return.
Trying to do the check before forking the children results in the test
failing as well, regardless of the status of finding the render node,
which is likely why the forking occurred first in the test's initial
creation. This way we ensure that things finish cleanly before moving to
the next test
Change-Id: I2e1b62fed25c30ff1f179612127c23960da4ee5e
[ROCm/ROCR-Runtime commit: e109ce541c]
Distributions want to generate debuginfo packages, do not strip them! If
you want to do it during installation use 'make install/strip'!
Change-Id: I3983af24ce4f4ddb189ede0ed0820dfee83b6280
[ROCm/ROCR-Runtime commit: 8ccfa4c75c]
KFD no longer reports MemoryAccessFault.Failure with retry fault
implementation. ROCr ignores the memory event when Failure = 0.
Use the Flags field instead, which will be non-zero when the
event is triggered.
Change-Id: Ie90799a303b0b2f1b476b20ffafdde79ae137182
[ROCm/ROCR-Runtime commit: 56f280c8a7]
Another cmake project like hsa-runtime could just use:
find_package(hsakmt REQUIRED 1.9.0)
Change-Id: Ia1c9a80ef287facdd607382d69649b0718d687b4
[ROCm/ROCR-Runtime commit: b8a1331763]
Makes malloc memory accessible to GPUs so that the memory has the
capabilities of the pool it is locked to.
This admits fine grained locked memory and reserves API space for any future
special CPU pools.
Change-Id: If8c3dd8582a43f19d3d36b3763c1a688cc419ef0
[ROCm/ROCR-Runtime commit: a535e18cc1]
GCC allows arithmetic on void* treating void as char. Clang and
the language spec does not.
Change-Id: I939f2432f276979bb81881406e10528597ac6001
[ROCm/ROCR-Runtime commit: e5de33dd9a]
This patch separates the build version (i.e. ROCm version) from the
library version used to set the SONAME of the shared objects. This
prevents the SONAME from getting bumped each time there is a new ROCm
release without any change to the libhsakmt ABI.
1.0.6 was choosen as the library version since this was the
last library version used prior to switching to the ROCm version
numbers.
Change-Id: I7c29ae84d8a362a831e804569d8147ca65155cad
[ROCm/ROCR-Runtime commit: 006c2c248d]
1. RAS error injection debugfs interface has been changed which
is using ras_ctrl instead of *_err_inject.
2. Remove ASSERT_SUCCESS for fwrite, because fwrite returns
the size of written item but not the error number.
3. Using throw exception instead of return to avoid a segment fault.
Change-Id: I6c4d9c2f7e66719faec99abd1552105a08c238a4
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
[ROCm/ROCR-Runtime commit: e5b215570b]