This patch changes the type of several loop index variables from int to
uint32_t in fmm.c. The affected functions are:
- __fmm_release
- _fmm_map_to_gpu
- _fmm_unmap_from_gpu
To fix compile warning:
warning: comparison of integer expressions of different signedness:
'int' and 'uint32_t' {aka 'unsigned int'} [-Wsign-compare]
2009 | for (i = 0; i < object->handle_num; i++) {
Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
blacklist the KFDEvictTest suite until the defects
SWDEV 535386 and 537002, where these test cases fail
inconsistently, are fixed
Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>
disable KFD RAS test case as the tests cause GPU reset
which affects the active kfdtest, the tests can only be
run successfully as separate processes
Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>
when allocating userptr buffer in system ram with size bigger
than or equal 512G, TTM has limit and returns error, to split one
big buffer into multiple small buffers in vm_object will solve
this issue.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Create CP queue and SDMA queue should fail with invalid queue ring
buffer or ring buffer size.
Test unmap or free queue buffers should fail before queue is destroyed.
Use child process to test unmap CWSR buffer will evict queue.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Change-Id: I5dcd51d6b43445d19a986f8b0b82063e20348a5f
If unmap from GPU return failed, for example, unmap user queue buffer
while queue is active, we should not free obj->mapped_node_id_array,
otherwise, the following unmap user queue buffer after queue is
destroyed still return failed.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Change-Id: I32aeb18871c2e971d01900d92916c54680f5c9fa
disable KFDLocalMemoryTest.Fragmentation and
KFDEventTest.MeasureInterruptConsumption as
part of the KFD test suite improvement feature
Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>
* Update createMCObjectStreamer() to use new LLVM API
Obsolete interfaces were removed via llvm-project's
f2ff298867d7733122e32eead5a8c524b09dfdb1
* Fix typo: LLVM_VERSION -> LLVM_VERSION_MAJOR
* Fix typo
Random driver deadlock on svm_range_evict_svm_bo_worker() is obeserved on
NPS2/DPX mode. It's seen with xnack off and happens more often on the
partition with less VRAM because of TMR.
Temporarily skip SVM Evict tests on Family AV when xnack is disabled.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
This builds on a prior change that allowed for allocating
a user-mode queue's packet buffer in device memory to also
allocate the queue struct in device memory. This provides
additional latency benefits particularly for cases where
dispatches are performed from the GPU itself. Flags are
added to support the various use cases.
The max queues per process is 1024 in KFD,
KFDQMTest.OverSubscribeCpQueues fails with multi-gpu mode
on more than 15 gpus, because 65x16=1040 exceeds 1024, so
changing MAX_CP_QUEUES to adapt it will fix the issue.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
The parent process can only be ptraced by 1 process
once, to avoid the error we have to add mutex to
synchronize the ptrace call.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
detect if the loaded driver is upstream or DKMS version and
add a filter for for the tests that fail in upstream driver
Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>
The debugger override will set the initial request mask to the
previously set request mask so use a different mask to assert
enablement.
Trap on wave start and end also run back to back, so fix the
previous override mask check as well.
In addition, unlike instruction traps, trap on wave start and end
will not require a rewind of the program counter on wave exit.
The over arching goal it so provide an API that pre-silicon models can latch into for software bring up.# Please enter the commit message for your changes. Lines starting
For the case parent goes faster then child, and child hasn't call the second
raise(SIGSTOP), then parent's "waitpid(childPid, &childStatus, 0)" will return,
and the childStatus will be 0x137f, which is SIGSTOP signal id.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
For the case that the child goes to the second raise(SIGSTOP),
and parent sends PTRACE_CONT, than child exits. Parent will assert at
DeviceSnapshot, as in kfd_ioctl, couldn't get the mm from child pid.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Environment variable HSA_HIGH_PRECISION_MODE can be used to control MFMA
precision
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: Ib78dd9dd8867025e090a3cca96ab6db4f65dea12
BUILD_SHARED_LIBS is a global flag so we don't need to set a default
option for it in both libhsakmt and hsa-runtime, only the top level
CMakeLists file. Also updated README to reflect that libhsakmt is
always built statically and gets linked to libhsa-runtime.
Change-Id: I1511f68a268032bec9758bc731d8074f33ec980f
Convert test to use multi-GPU framework.
Add mutex to fix intermixed log issue and annotate logging with
gpu node number.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: Ic2beeadb1eb4b5a9a0710ac1dbd60b9bf1d84c33
"s_waitcnt 0" (deprecated in gfx12) is redundant here.
s_endpgm will wait for all outstanding instructions
to complete before executing.
Change-Id: Ia8b4dd0fd8dd713e7ba2cba9db85b7b12cee1dd4
Signed-off-by: Lang Yu <lang.yu@amd.com>
Since GFX950 can support page table fragment up to 18 without
performance loss. So set GFX950 default svm.alignment_order to 18.
Change-Id: Ibcdb7f041fb07a38e924c471beec261ea227ca1d
Signed-off-by: James Zhu <James.Zhu@amd.com>
This patch creates the blacklist for gfx950 by copying gfx942 but adding
KFDGWSTest.Semaphore as GWS support is completely removed from gfx950.
Change-Id: I5d7c17e57b8cfd9fae63780ecc9dd55662cfdade
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Make sure to use allocate the same amount of size for VGPR data in
gfx950 as it is done for gfx940.
Change-Id: I6a0820996389627ccbdfef856e5150c46fac92a1
Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
The CWSR area size needs to take into account the size of LDS each
active workgroup can have. The current implementation uses a constant
for that. This patch refactors this to use the HsaNodeProperties of the
device's the CWSR area is for to figure out the size of LDS.
Change-Id: Ib8585b2b7140ec5c99e7b7d62e67f785697c028a
Signed-off-by: Lancelot Six <Lancelot.Six@amd.com>
Signed-off-by: Amber Lin <Amber.Lin@amd.com>