When using total system memory to test on some motherboard,
the test take very long time, witch looks like system Hang.
Change-Id: Ic31fe60cfb1363fbc8a2d8f7e1cb2bae0e149ea8
Signed-off-by: gaba <gaba@amd.com>
[ROCm/ROCR-Runtime commit: 8c35523225]
'const' member functions have syntax errors and struct
'data_rdy' have uninitialized members
v1: correct misplaced 'const' for member functions
v2: add initialization for 'data_rdy' in constructor
Change-Id: I29bada475217c9df81f0d0400e7a3f44aa8afe0c
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
[ROCm/ROCR-Runtime commit: c48e8a918e]
When all 64-bits of the signal value are 0, we can skip polling for that
signal.
We need to keep signals as 64-bit numbers as part of the spec. But most
users of ROCr do will never set the signal value to more than 32-bits.
When the dependent-signals are less than 32-bits, avoid adding extra
SDMA poll packet as this adds latency to the SDMA copies.
Change-Id: I37dca65fe3f060dc7164f49b98cb1985023663c4
[ROCm/ROCR-Runtime commit: 0544c2336b]
1, Use s_wait_* instead of s_waitcnt
2, Remove a redundant s_waitcnt
Change-Id: Id0f31db0fc520adadd81eb574ad389f63859303a
Signed-off-by: Lang Yu <lang.yu@amd.com>
[ROCm/ROCR-Runtime commit: 37135aadfa]
Add free() for 'all_gpu_id_array' in
hsakmt_fmm_destroy_process_apertures() and
removed it from 'hsakmt_fmm_clear_all_mem()'
Change-Id: I32d2d22e7152f62a3f2e7da4f601f0db7cebd534
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
[ROCm/ROCR-Runtime commit: c066ec13dd]
DisableCpQueueByUpdateWithZeroPercentage need to destroy event to avoid
event leak.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Change-Id: I4fb51b670fbff1edcd7fd61517f5c8a6674003c0
[ROCm/ROCR-Runtime commit: 1f9c080932]
This was missing from a previous commit regarding
dynamically allocated static data structures.
Change-Id: Iae1c674e762f85e3aebf338210ba96942ba80278
[ROCm/ROCR-Runtime commit: eec2130443]
Removed 'args' as a unique pointer and deletion in
'ThreadTrampoline', then declared as a class member.
Change-Id: Ia52058392d0170e8b5e57cfdd2c587f47a6f93f0
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
[ROCm/ROCR-Runtime commit: 89115369cc]
The issue arises in the CatchSignal function, which attempts to write to
the standard error stream upon receiving a signal. However, the standard
error stream may already be locked at this point, as the parent process
also attempts to write to the standard error stream after mapping the GPU
memory. This leads to a deadlock, with the program waiting for the
release of the lock on the standard error stream.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Change-Id: Ie69354f4342b96ffe1f2a87f655687da1cbee4b9
[ROCm/ROCR-Runtime commit: c8031f2a69]
there are some timeout issues of evict tests on recent new boards,
it is to solve those issues and optimize evict timeout, as well
as to give user a chance to change timeout in command line.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Change-Id: I2f40c8ea809c55675b0d0b62296b663481e5fb16
[ROCm/ROCR-Runtime commit: 09b899b079]
Added initializations for 'ext_table' in 'hsa_system_get_major_extension_table()'
Change-Id: I5e46592192b7d7a294d30011481f16e93db11794
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
Reviewed-by: David Yat Sin <david.yatsin@amd.com>
[ROCm/ROCR-Runtime commit: d91a14ae0c]
This test is disabled until kernel patches are added to handle invalid
user actions gracefully. These patches validate and block operations
like freeing active queue buffers, which can corrupt the driver's state
if unhandled.
Currently, such operations result in driver state corruption, leading
to segmentation faults and subsequent failures during runtime.
Change-Id: If4c321a14df950a639141fc96048889659c14477
[ROCm/ROCR-Runtime commit: 2cf3813f9f]
Some KFD versions can return from hsaKmtWaitOnMultipleEvents_Ext without
any wait and require the second call without age array init.
Change-Id: I8358c33080084d47c273c2a2827085d0570c8201
[ROCm/ROCR-Runtime commit: 816af44b05]
Initialized 'scratch_base' as a nullptr to avoid
uninitialized read in hsaKmtAllocMemory()
Change-Id: I3b0e67f3fd3b591e1d21d691f0777b1d1a059b73
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
[ROCm/ROCR-Runtime commit: 6f6ee9679c]
Added check and initialized parameters for PtrInfo().
v1: Checking if PtrInfo() returns success.
v2: Initialization for variables being passed to PtrInfo().
Change-Id: If3ec4608c8e58be259b4fd51ad681b9bc34ddff6
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
Reviewed-by: David Yat Sin <david.yatsin@amd.com>
[ROCm/ROCR-Runtime commit: 610f8a1e0f]
GFX 9.0.8 may not properly support pipe reset capabilities so disable
test for now.
Change-Id: I3061cdad87eb979ba884c194f4229c0cbb144ee2
[ROCm/ROCR-Runtime commit: 0f02ed6ffb]
KFDDBGTest and KFDNegative test can eat into memory and event resources
for subsequent test interations if unallocated.
Change-Id: Iea170c20df8d487703441181b6c152b61f02d3db
[ROCm/ROCR-Runtime commit: 26d338df12]
Queue 2's wave blocked the queue 1's wave save, which will cause unmap
queue preemption fail. Add nop per SQ suggested.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Change-Id: Iea7f280e35487059c4499ea999b9e0cdf841d1e1
[ROCm/ROCR-Runtime commit: f047f96161]
WaitSemaphore and PostSemaphore are used in the HybridMutex
implementation. If HybridMutex did not have to call WaitSemaphore when
acquired, then calling PostSemaphore would cause the internal count
inside sem_t to slowly grow to large values and eventually cause
overflow.
Change-Id: I173fc17c874b49926e56991405e9086ea8c138fc
[ROCm/ROCR-Runtime commit: f58aff630c]
Add support for abort timeout when hsa_signal_wait_relaxed is called and
signal does not clear within timeout.
timeout is in seconds
Change-Id: If1db5a8af33c82ddc4b48968c3d8eceb97d0ea6d
[ROCm/ROCR-Runtime commit: 4ec730f1dc]
Per-queue reset is now supported and flagged in HSA capabilities.
Change-Id: I21e2421da73b9fafae19c903dc3eeeab1f84968d
[ROCm/ROCR-Runtime commit: 1a4adaf7bc]
runtime and devel packages are providing the hsakmt packages. Only devel package need to provide the same
Change the package replaces/obsoletes field accordingly
Change-Id: Ia1a4f128a1f6928faf57faee5f301a77c21acca2
[ROCm/ROCR-Runtime commit: 2970545ded]
To allow non-POD global variables to last until the last thread
has exited, use "new" to allocate the memory instead of static
allocation.
Change-Id: Ica571b61ff8068a52e472c49cb1c44917e60c8c8
[ROCm/ROCR-Runtime commit: 0878deda17]
Also restore dead signals cleanup for old path when HSA_WAIT_ANY_DEBUG
is used.
Change-Id: I51a7404991443c9f6cbf57b4b9e9faa694b9538c
[ROCm/ROCR-Runtime commit: 700f1d9abd]
An ASAN run of the release build revealed some elements of
the supported_isas static map were still using stack data. This
change makes it use heap data so it will persist.
Change-Id: Ie51887e88b9e2dec27acfc97ea45a6219fea971c
[ROCm/ROCR-Runtime commit: c7521a5f2a]
SDMA queue resources are limited when all SDMA copies are bottle necked
into 2 engines. Callers will not be able to make the best decisions
to allocate queue resources fairly so have ROCr fallback to old round
robin behaviour dictated by KFD.
Change-Id: I93d52297976d74e20129c5eb1dcfbfa5aa5067a7
[ROCm/ROCR-Runtime commit: 7f8676e177]
These are mostly AIE related, but there are a couple of others.
Change-Id: I549e004772160ca282d4c94dc9d94dd2ccae8b1c
[ROCm/ROCR-Runtime commit: 08699069d6]
- Add the new path to avoid WaitAny() calls in AsyncEventsLoopp() with
HSA_WAIT_ANY_DEBUG key. The new path is selected by default.
The optimizaiton combines all logic of WaitAny() in a single processing loop
and avoids extra memory allocations or ref counting. Also it won't spin
on the CPU if all events are busy.
Change-Id: I197ce60d0d023fbb672f700d6e87702686f1f55a
[ROCm/ROCR-Runtime commit: 0fc7369ba5]
On GPUs where EOP is handled in asic, the read_dispatch_id is not always
updated after each packet. Look for the first dispatch packet that needs
scratch memory before allocating scratch.
Change-Id: Ibf4b4b485f99bf2fabfe48e9609ca99111fdafbe
[ROCm/ROCR-Runtime commit: d90fbee9c4]
Current test has 4 processes, each process allocate and access 512
buffers, this requires 2048 waves to access 2048 buffers at same time to
finish the test. For CPX compute partition mode, each compute node has
less waves and cause random test failure. Change test to 2 processes to
use 1024 waves to access 1024 buffers with the increased buffer size.
Add waves_num check to avoid the test failure on new ASICs or simulator,
skip test if the available waves is less than 1024.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Change-Id: I64b5f9172b62cf38f62fbb0b48a801b8a11401c0
[ROCm/ROCR-Runtime commit: e6d4a32c42]
The supported_isas static unordered_map was adding stack
allocated Isa objects. Instead, make the objects statically
allocated, as supported_isas itself is.
Change-Id: I23405e218290d48deea6f984f76c57e7b43e314e
[ROCm/ROCR-Runtime commit: fd99b74287]