Граф коммитов

2822 Коммитов

Автор SHA1 Сообщение Дата
Apurv Mishra b490aec8e6 kfdtest: support for upstream kernel driver
detect if the loaded driver is upstream or DKMS version and
add a filter for for the tests that fail in upstream driver

Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>


[ROCm/ROCR-Runtime commit: 10530fa2a7]
2025-03-27 16:55:21 -04:00
Yiannis Papadopoulos 2c731096c6 rocr/aie: Returning error code if query not recognized
[ROCm/ROCR-Runtime commit: 0bd4acb5d4]
2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos c142b04fc1 rocr/aie: Bundling XDNA BOs and addresses, adding cleanup guard in case of error
[ROCm/ROCR-Runtime commit: e55503e7f8]
2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos 4c0e8b5f70 rocr/aie: Avoiding XdnaDriver class in queue API
[ROCm/ROCR-Runtime commit: f4e1c9b0ba]
2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos bd109ec288 rocr/aie: Remove unused struct from HSA API
[ROCm/ROCR-Runtime commit: 8dcbbf31c7]
2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos a11d693a47 rocr: Remove unused lambda
[ROCm/ROCR-Runtime commit: bf8ab493c4]
2025-03-27 10:33:40 -04:00
Yiannis Papadopoulos 13723c6308 rocr/aie: Resolve parentheses warning
[ROCm/ROCR-Runtime commit: b066e0eefa]
2025-03-27 10:33:40 -04:00
David Yat Sin edcc3a1ed5 rocr: Release agent resources before pools
Adding a general stage for agents to release their resources on
shutdown. This avoids a circular dependency during shutdown because
we have to delete allocated resources before deleting memory pools, but
we also have to delete memory pools before destroying agents.


[ROCm/ROCR-Runtime commit: 947391deac]
2025-03-25 14:25:04 -04:00
Yiannis Papadopoulos 7a2b25e1ea rocr: Release vmem handles before agent destruction
[ROCm/ROCR-Runtime commit: a66130bc48]
2025-03-25 14:25:04 -04:00
Yiannis Papadopoulos 427962679e rocr: Return success status in IsModelEnabled()
[ROCm/ROCR-Runtime commit: 765563b786]
2025-03-25 10:05:16 -04:00
lyndonli e9c934c116 rocr: Remove redundant Refresh() call
The initial call to Refresh() in the constructor is
unnecessary as it's handled in Runtime::Load().

Signed-off-by: lyndonli <Lyndon.Li@amd.com>


[ROCm/ROCR-Runtime commit: c34a2798ce]
2025-03-25 09:13:59 -04:00
Jonathan Kim 20d9a9a15a kfdtest: fix trap on wave start and end
The debugger override will set the initial request mask to the
previously set request mask so use a different mask to assert
enablement.
Trap on wave start and end also run back to back, so fix the
previous override mask check as well.

In addition, unlike instruction traps, trap on wave start and end
will not require a rewind of the program counter on wave exit.


[ROCm/ROCR-Runtime commit: c710a06ee0]
2025-03-24 20:44:27 -04:00
Adel Johar 6195f65f9e Docs: Add more variables to env_variables.rst
[ROCm/ROCR-Runtime commit: d8d27d4fd6]
2025-03-20 11:59:58 -04:00
Lang Yu cd239c7bcf rocrtst: fix rocrtst.Test_Example
VerifyResult always returns true. That's not expected.

Signed-off-by: Lang Yu <lang.yu@amd.com>


[ROCm/ROCR-Runtime commit: 89926f5b0b]
2025-03-20 12:57:52 +08:00
Shweta Khatri b570f22aca rocr: Fix PcSamplingCreateFromId to pass 32-bit dword count to DmaFill
In PcSamplingCreateFromId, convert number of bytes into number of
dwords because DmaFill expects a count of 32-bit words, not raw bytes.
This prevents OOB writes on large sampling buffers.


[ROCm/ROCR-Runtime commit: 2ae70735e8]
2025-03-19 14:42:41 -04:00
Lao, Darren c03e4cfe4d rocr: Change ISA grid dimensions
Signed-off-by: Lao, Darren <Darren.Lao@amd.com>

[ROCm/ROCR-Runtime commit: cd4d236185]
2025-03-19 13:44:17 -04:00
Tim Gu bf1a60e2f9 Update build instructions
[ROCm/ROCR-Runtime commit: 0a28e0a54a]
2025-03-18 19:54:20 -04:00
randyh62 407704bf61 fix license include path
[ROCm/ROCR-Runtime commit: e2f3e8c0de]
2025-03-18 16:29:10 -04:00
David Yat Sin d94b4becd8 Revert rocr: Only expose ext-fine-grain pool on xgmi-hive systems
This reverts commit 0097218f2b.


[ROCm/ROCR-Runtime commit: ce0244ac03]
2025-03-18 16:28:36 -04:00
jordans 938b34da24 hsakmt: Initial Commit for the HSA KMT Model
The over arching goal it so provide an API that pre-silicon models can latch into for software bring up.# Please enter the commit message for your changes. Lines starting


[ROCm/ROCR-Runtime commit: d4b85b6bf5]
2025-03-18 16:22:17 -04:00
David Yat Sin 9e8859636e rocr: Workaround for SDMA POLL_REGMEM on gfx9.0
Poll the dependent signals twice on all gfx9.0 GPUs except gfx90a.
This is needed as a work-around for a rare issue where SDMA_POLL_REGMEM
may return before the memory is actually cleared.


[ROCm/ROCR-Runtime commit: 6903a41b1d]
2025-03-17 17:59:15 -04:00
Mallya, Ameya Keshava ecb119aec3 Added release trigger for further releases
Signed-off-by: Mallya, Ameya Keshava <AmeyaKeshava.Mallya@amd.com>

[ROCm/ROCR-Runtime commit: 5d254c6fb0]
2025-03-14 13:52:00 -07:00
Stella Laurenzo 5a3b9a1fdf rocr: Search for libnuma with find_package before find_library.
This avoids a false dependence on a system library when not desired.


[ROCm/ROCR-Runtime commit: c36ccaaf4b]
2025-03-14 08:16:13 -07:00
Hila, Nino b998485d78 Update palamida.yml
Signed-off-by: Hila, Nino <Nino.Hila@amd.com>

[ROCm/ROCR-Runtime commit: 98a5ebc3f1]
2025-03-13 20:08:56 -04:00
Hila, Nino caf1fa2d14 Create palamida.yml
Signed-off-by: Hila, Nino <Nino.Hila@amd.com>

[ROCm/ROCR-Runtime commit: 0e2064e6a7]
2025-03-13 16:07:18 -04:00
Benjamin Welton e62422520a rocr: Reset event_age when signals move
Resets event_age when signals move. Prior to this PR, event_age
can become unaligned with hsa_event, causing hangs if the event_age
exceeds the true hsa_event age.


[ROCm/ROCR-Runtime commit: d2a89a467b]
2025-03-13 11:32:16 -04:00
Emily Deng af293c4a61 kfdtest: Fix the childStatus is 0x7f error for KFDDBGTest.HitMemoryViolation
For the case parent goes faster then child, and child hasn't call the second
raise(SIGSTOP), then parent's "waitpid(childPid, &childStatus, 0)" will return,
and the childStatus will be 0x137f, which is SIGSTOP signal id.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>


[ROCm/ROCR-Runtime commit: 42f79776cd]
2025-03-13 13:38:46 +08:00
Emily Deng 46bb10ff2d kfdtest: Fix DeviceSnapshot return fail error for KFDDBGTest.HitMemoryViolation
For the case that the child goes to the second raise(SIGSTOP),
and parent sends PTRACE_CONT, than child exits. Parent will assert at
DeviceSnapshot, as in kfd_ioctl, couldn't get the mm from child pid.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>


[ROCm/ROCR-Runtime commit: 91ef44d3ec]
2025-03-13 13:38:46 +08:00
Apurv Mishra 1e279a19c3 kfdtest: limit GFX VRAM allocation to 1/4 sys mem
reduce the allocated memory for GFX VRAM as
KFD Evict test faced intermittent page faults,
which can be due to larger GFX CS BO size


[ROCm/ROCR-Runtime commit: 85c4b0020a]
2025-03-12 13:54:04 -04:00
Yiannis Papadopoulos 566269e8b7 rocr/aie: Changing variable names
[ROCm/ROCR-Runtime commit: c7936334cf]
2025-03-11 19:35:21 -04:00
Yiannis Papadopoulos 8e111ff2f0 rocr/aie: Handle non-HSA_STATUS_SUCCESS during VisitRegion
[ROCm/ROCR-Runtime commit: fb33e2e724]
2025-03-11 19:35:21 -04:00
Apurv Mishra 77f4bbfdf1 kfdtest: add blacklist for RHEL9 system
add tests for exclusion when running kfdtest
on RHEL9 system, tested with Navi 31

Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>


[ROCm/ROCR-Runtime commit: de8f8f076d]
2025-03-11 16:40:25 -04:00
Longlong Yao 007795951b rocr: export pointer type for OnlyAddress
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>


[ROCm/ROCR-Runtime commit: a254e35fd6]
2025-03-11 10:16:58 -04:00
Longlong Yao ef1740b88b libhsakmt: set node_id to 0 for OnlyAddress
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>


[ROCm/ROCR-Runtime commit: 5916467552]
2025-03-11 10:16:58 -04:00
Amber Lin fffdffc3ce kfdtest: Temporarily blacklist KFDNegativeTest
Blacklist KFDNegativeTest.BasicPipeReset from gfx950 until MEC can
support pipe reset on GC 9.5.0.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>


[ROCm/ROCR-Runtime commit: fcf3f91379]
2025-03-10 10:37:19 -07:00
zichguan-amd 1d51406e80 Throw exception when runtime not initialized for hsa_amd_signal_wait_*
Signed-off-by: zichguan-amd <zichuan.guan@amd.com>


[ROCm/ROCR-Runtime commit: 3415a500c7]
2025-03-07 15:17:10 -05:00
zichguan-amd b172fbd538 rocr: Allow 0/NULL/invalid signal handles for wait operations to be no-op
Remove hard assertions for signal validation on hsa_amd_signal_wait_* operations, instead ignore 0/NULL/invalid signals in the dependency condition evaluation to align with HSA specs for barrier-AND and barrier-OR packets.

Signed-off-by: zichguan-amd <zichuan.guan@amd.com>


[ROCm/ROCR-Runtime commit: e4d027191c]
2025-03-07 15:17:10 -05:00
David Yat Sin e130172218 rocr: Put back scratch_backing_memory_byte_size
The scratch_backing_memory_byte_size is not used by CP, but it is
currently used by rocgdb. Putting the field back, but we need to find a
solution for alt_scratch_backing_memory_byte_size.

Also, completely disabling alternate scratch as we need some changes to
support debugger.


[ROCm/ROCR-Runtime commit: 02b38d0614]
2025-03-06 16:23:38 -05:00
Jonathan Kim 8cbb23183c kfdtest: Add KFD SDMA queue reset testing
The KFD can per-SDMA queue reset similar to compute queue reset.
Add test.


[ROCm/ROCR-Runtime commit: c879fdefcf]
2025-03-06 14:04:42 -05:00
Jonathan Kim 36c69a6cff kfdtest: Add KFD SDMA queue reset testing
The KFD can per-SDMA queue reset similar to compute queue reset.
Add test.


[ROCm/ROCR-Runtime commit: ee890e7d2b]
2025-03-06 14:04:42 -05:00
Jonathan Kim 06b2c3aeb6 kfdtest: Allow user to modify packet size for SDMA write packets
This is primarily used for debug and negative testing for SDMA queue
reset and shouldn't be used for normal run cases.


[ROCm/ROCR-Runtime commit: d047708317]
2025-03-06 14:04:42 -05:00
Jonathan Kim 297e8f729e kfdtest: Add create SDMA queue by target engine
KFD supports SDMA queue creation by target engine.
Enable this for testing.


[ROCm/ROCR-Runtime commit: 9e57ce48e8]
2025-03-06 14:04:42 -05:00
Jonathan Kim 303cdb8f7e kfdtest: Add SDMA poll memory register packet support
The SDMA can wait on poll user memory.  This is being added to
support per-SDMA queue reset testing.


[ROCm/ROCR-Runtime commit: a957b24153]
2025-03-06 14:04:42 -05:00
Jonathan Kim 599a20ee2d hsakmt: Expose per-SDMA queue reset capabilities
Expose new capabilities field that flags per-sdma queue reset
support.


[ROCm/ROCR-Runtime commit: e3d09e30dc]
2025-03-06 14:04:42 -05:00
Su, Daniel b213a6aa3f External CI: change trigger from amd-master to amd-mainline
Signed-off-by: Su, Daniel <Daniel.Su@amd.com>

[ROCm/ROCR-Runtime commit: 70b44c576c]
2025-03-05 16:24:29 -05:00
David Yat Sin 0097218f2b rocr: Only expose ext-fine-grain pool on xgmi-hive systems
We cannot guarrantee system-scope coherency on systems with only PCIe
connections, so do not expose extended fine-grain memory pool on these
systems.


[ROCm/ROCR-Runtime commit: 6dac90c89a]
2025-03-05 10:41:38 -05:00
Lao, Darren de8e56a964 rocr: Change grid dimensions
Signed-off-by: Lao, Darren <Darren.Lao@amd.com>


[ROCm/ROCR-Runtime commit: 0cd46b6582]
2025-03-04 16:19:51 -05:00
David Yat Sin 732c3cfa8f rocrtst: Disable RLIMIT for negative queue tests
The negative queue tests generate an exception which triggers a coredump
generation. Disable RLIMIT so that the coredumps are not generated for
these tests.


[ROCm/ROCR-Runtime commit: 4cb6a6d45d]
2025-03-04 10:29:34 -05:00
David Yat Sin 35faa9783a rocr: Check RLIMIT_CORE before generating coredump
Check for RLIMIT_CORE before collecting data for coredump. If the
current limit is 0, then we can return early without spending time
collecting coredump data.


[ROCm/ROCR-Runtime commit: d031af9eb5]
2025-03-04 10:29:34 -05:00
David Yat Sin 0a8ce4b90d rocr:Only set asan flag on GPU agents
[ROCm/ROCR-Runtime commit: 3944da1d76]
2025-03-03 14:51:19 -05:00