disable KFD RAS test case as the tests cause GPU reset
which affects the active kfdtest, the tests can only be
run successfully as separate processes
Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>
[ROCm/ROCR-Runtime commit: d9a95605cc]
v1: Add value pointer validation before
dereferencing in GetInfo method for MODULE_NAME case.
Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
[ROCm/ROCR-Runtime commit: f1f34da4f6]
gfx9-generic cannot support sramecc- and sramecc+.
sramecc feature is only configurable on gfx906.
The code object produced for gfx9-generic can be loaded on both
gfx906 with any sramecc setting, compiler will produce the isa
that will correctly work on both (EF_AMDGPU_FEATURE_SRAMECC_ANY_V4).
[ROCm/ROCR-Runtime commit: b7361c5ee4]
gfx9-generic cannot support sramecc- and sramecc+.
sramecc feature is only configurable on gfx906.
The code object produced for gfx9-generic can be loaded on both
gfx906 with any sramecc setting, compiler will produce the isa
that will correctly work on both (EF_AMDGPU_FEATURE_SRAMECC_ANY_V4).
[ROCm/ROCR-Runtime commit: 3e99bb6150]
To change biggest single buffer to be huge page aligned
and other optimization.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
[ROCm/ROCR-Runtime commit: afe7965796]
when allocating userptr buffer in system ram with size bigger
than or equal 512G, TTM has limit and returns error, to split one
big buffer into multiple small buffers in vm_object will solve
this issue.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
[ROCm/ROCR-Runtime commit: 8887d25304]
On large BAR systems, for small-sized code-objects, we get performance
using direct memcpy due to latencies when doing the blit-copy.
[ROCm/ROCR-Runtime commit: da2607024b]
Invalidate only the address range that covers the newly copied
code-object. This avoids invalidating I$ for old code objects and thus
might increase I$ hit rate.
[ROCm/ROCR-Runtime commit: e969e01f54]
Create CP queue and SDMA queue should fail with invalid queue ring
buffer or ring buffer size.
Test unmap or free queue buffers should fail before queue is destroyed.
Use child process to test unmap CWSR buffer will evict queue.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Change-Id: I5dcd51d6b43445d19a986f8b0b82063e20348a5f
[ROCm/ROCR-Runtime commit: bd86fb1e63]
If unmap from GPU return failed, for example, unmap user queue buffer
while queue is active, we should not free obj->mapped_node_id_array,
otherwise, the following unmap user queue buffer after queue is
destroyed still return failed.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Change-Id: I32aeb18871c2e971d01900d92916c54680f5c9fa
[ROCm/ROCR-Runtime commit: 3e6f51b715]
disable KFDLocalMemoryTest.Fragmentation and
KFDEventTest.MeasureInterruptConsumption as
part of the KFD test suite improvement feature
Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>
[ROCm/ROCR-Runtime commit: f853dda9ba]
When compiling with -O0, some compilers generate a xchg instruction for
the __atomic_store(...) built-in. Using xchg on MMIO memory is
undefined-behavior and may be ignored on certain CPUs.
[ROCm/ROCR-Runtime commit: f011a9506d]
For native and DTIF backends, unify to use HSAKMT_CALL(...) to call
hsaKmt APIs.
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
[ROCm/ROCR-Runtime commit: 7ba77fb193]
Using HSA_ENABLE_DTIF to control dtif/native thunk code path
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
[ROCm/ROCR-Runtime commit: 166b0fa45a]
When copying for inter devices, Currently only XGMI as exposed. Now
SDMA0/1 will be exposed as well for inter device copies especially that
they are one of the recommended engines.
Signed-off-by: Li Ma <li.ma@amd.com>
[ROCm/ROCR-Runtime commit: e38dd98914]
Set the rec_sdma_eng_override_ for other gpus, or DmaCopyOnEngine
will use sdma for D<->D copy, which will trigger invalid argument.
[ROCm/ROCR-Runtime commit: 82a88f2e2b]
* Update createMCObjectStreamer() to use new LLVM API
Obsolete interfaces were removed via llvm-project's
f2ff298867d7733122e32eead5a8c524b09dfdb1
* Fix typo: LLVM_VERSION -> LLVM_VERSION_MAJOR
* Fix typo
[ROCm/ROCR-Runtime commit: ac1e6d59c2]
Random driver deadlock on svm_range_evict_svm_bo_worker() is obeserved on
NPS2/DPX mode. It's seen with xnack off and happens more often on the
partition with less VRAM because of TMR.
Temporarily skip SVM Evict tests on Family AV when xnack is disabled.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: 5e28208cec]
Adds a bool to the GPU agent and a public member method to
check if the GPU supports large BAR. This is needed so we can
check if large BAR is supported when a user tries to allocate
an AQL queue in device memory on a given GPU agent.
Also adds an exception to the AQL queue if device-side AQL queues
are requested and the GPU owner of the AQL doesn't support large
BAR. Otherwise, ROCr will currently allow device-side queues
that can cause faults when the user tries to touch their ring
buffers and the user will not know why the faults are occuring.
This relies on the fact that the KFD does not exposed any links
from the CPU to the GPU if large BAR is not enabled (though
links from the GPU to the CPU may still be exposed by the KFD).
[ROCm/ROCR-Runtime commit: f2c482d923]
This builds on a prior change that allowed for allocating
a user-mode queue's packet buffer in device memory to also
allocate the queue struct in device memory. This provides
additional latency benefits particularly for cases where
dispatches are performed from the GPU itself. Flags are
added to support the various use cases.
[ROCm/ROCR-Runtime commit: 6e3c375bf1]