detect if the loaded driver is upstream or DKMS version and
add a filter for for the tests that fail in upstream driver
Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>
[ROCm/ROCR-Runtime commit: 10530fa2a7]
Adding a general stage for agents to release their resources on
shutdown. This avoids a circular dependency during shutdown because
we have to delete allocated resources before deleting memory pools, but
we also have to delete memory pools before destroying agents.
[ROCm/ROCR-Runtime commit: 947391deac]
The initial call to Refresh() in the constructor is
unnecessary as it's handled in Runtime::Load().
Signed-off-by: lyndonli <Lyndon.Li@amd.com>
[ROCm/ROCR-Runtime commit: c34a2798ce]
The debugger override will set the initial request mask to the
previously set request mask so use a different mask to assert
enablement.
Trap on wave start and end also run back to back, so fix the
previous override mask check as well.
In addition, unlike instruction traps, trap on wave start and end
will not require a rewind of the program counter on wave exit.
[ROCm/ROCR-Runtime commit: c710a06ee0]
In PcSamplingCreateFromId, convert number of bytes into number of
dwords because DmaFill expects a count of 32-bit words, not raw bytes.
This prevents OOB writes on large sampling buffers.
[ROCm/ROCR-Runtime commit: 2ae70735e8]
The over arching goal it so provide an API that pre-silicon models can latch into for software bring up.# Please enter the commit message for your changes. Lines starting
[ROCm/ROCR-Runtime commit: d4b85b6bf5]
Poll the dependent signals twice on all gfx9.0 GPUs except gfx90a.
This is needed as a work-around for a rare issue where SDMA_POLL_REGMEM
may return before the memory is actually cleared.
[ROCm/ROCR-Runtime commit: 6903a41b1d]
Resets event_age when signals move. Prior to this PR, event_age
can become unaligned with hsa_event, causing hangs if the event_age
exceeds the true hsa_event age.
[ROCm/ROCR-Runtime commit: d2a89a467b]
For the case parent goes faster then child, and child hasn't call the second
raise(SIGSTOP), then parent's "waitpid(childPid, &childStatus, 0)" will return,
and the childStatus will be 0x137f, which is SIGSTOP signal id.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
[ROCm/ROCR-Runtime commit: 42f79776cd]
For the case that the child goes to the second raise(SIGSTOP),
and parent sends PTRACE_CONT, than child exits. Parent will assert at
DeviceSnapshot, as in kfd_ioctl, couldn't get the mm from child pid.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
[ROCm/ROCR-Runtime commit: 91ef44d3ec]
reduce the allocated memory for GFX VRAM as
KFD Evict test faced intermittent page faults,
which can be due to larger GFX CS BO size
[ROCm/ROCR-Runtime commit: 85c4b0020a]
Blacklist KFDNegativeTest.BasicPipeReset from gfx950 until MEC can
support pipe reset on GC 9.5.0.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: fcf3f91379]
Remove hard assertions for signal validation on hsa_amd_signal_wait_* operations, instead ignore 0/NULL/invalid signals in the dependency condition evaluation to align with HSA specs for barrier-AND and barrier-OR packets.
Signed-off-by: zichguan-amd <zichuan.guan@amd.com>
[ROCm/ROCR-Runtime commit: e4d027191c]
The scratch_backing_memory_byte_size is not used by CP, but it is
currently used by rocgdb. Putting the field back, but we need to find a
solution for alt_scratch_backing_memory_byte_size.
Also, completely disabling alternate scratch as we need some changes to
support debugger.
[ROCm/ROCR-Runtime commit: 02b38d0614]
This is primarily used for debug and negative testing for SDMA queue
reset and shouldn't be used for normal run cases.
[ROCm/ROCR-Runtime commit: d047708317]
We cannot guarrantee system-scope coherency on systems with only PCIe
connections, so do not expose extended fine-grain memory pool on these
systems.
[ROCm/ROCR-Runtime commit: 6dac90c89a]
The negative queue tests generate an exception which triggers a coredump
generation. Disable RLIMIT so that the coredumps are not generated for
these tests.
[ROCm/ROCR-Runtime commit: 4cb6a6d45d]
Check for RLIMIT_CORE before collecting data for coredump. If the
current limit is 0, then we can return early without spending time
collecting coredump data.
[ROCm/ROCR-Runtime commit: d031af9eb5]