Commit Graph

2853 Commitit

Tekijä SHA1 Viesti Päivämäärä
Searles, Mark ac1e6d59c2 Update createMCObjectStreamer() to use new LLVM API (#156) (#157)
* Update createMCObjectStreamer() to use new LLVM API

Obsolete interfaces were removed via llvm-project's
f2ff298867d7733122e32eead5a8c524b09dfdb1

* Fix typo: LLVM_VERSION -> LLVM_VERSION_MAJOR

* Fix typo
2025-05-05 13:18:05 -07:00
Apurv Mishra aa0a32a166 kfdtest: Update ROCr homepage in CMakeLists.txt
Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>
2025-05-01 11:22:49 -04:00
David Yat Sin 4ed5950beb rocr: Fix logic for scratch reclaim
Fix logic error that can cause scratch memory to be reclaimed while a
dispatch is still using it.
2025-04-29 17:23:45 -04:00
Amber Lin 5e28208cec kfdtest: Skip SVMEvict with xnack=0
Random driver deadlock on svm_range_evict_svm_bo_worker() is obeserved on
NPS2/DPX mode. It's seen with xnack off and happens more often on the
partition with less VRAM because of TMR.

Temporarily skip SVM Evict tests on Family AV when xnack is disabled.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2025-04-25 12:45:36 -04:00
Tony Gutierrez f2c482d923 rocr: Add large_bar_enabled var to the GPU agent
Adds a bool to the GPU agent and a public member method to
check if the GPU supports large BAR. This is needed so we can
check if large BAR is supported when a user tries to allocate
an AQL queue in device memory on a given GPU agent.

Also adds an exception to the AQL queue if device-side AQL queues
are requested and the GPU owner of the AQL doesn't support large
BAR. Otherwise, ROCr will currently allow device-side queues
that can cause faults when the user tries to touch their ring
buffers and the user will not know why the faults are occuring.

This relies on the fact that the KFD does not exposed any links
from the CPU to the GPU if large BAR is not enabled (though
links from the GPU to the CPU may still be exposed by the KFD).
2025-04-23 15:53:29 -04:00
Tony Gutierrez 6e3c375bf1 rocr: Flags to alloc queue buf/struct in dev mem
This builds on a prior change that allowed for allocating
a user-mode queue's packet buffer in device memory to also
allocate the queue struct in device memory. This provides
additional latency benefits particularly for cases where
dispatches are performed from the GPU itself. Flags are
added to support the various use cases.
2025-04-23 15:53:29 -04:00
Tony Gutierrez 11d1d2cd25 rocr: Remove empty shared.cpp 2025-04-23 15:53:29 -04:00
Tony Gutierrez adbc0495e2 rocr/libhsakmt: Add coarse-grain allocator to GPU 2025-04-23 15:53:29 -04:00
Saleel Kudchadker 57c0c643ce rocr: return preferred SDMA engine mask
- Add a new AMD extension API to return preferred SDMA engine mask.
This can use used in conjunction with copy_on_engine API to get
optimal bandwidth.
2025-04-22 13:28:38 -07:00
Amber Lin bdb6e43b54 Revert "kfdtest: Temporarily blacklist KFDNegativeTest"
This reverts commit fcf3f91379.

MEC v18 starts to support pipe reset
2025-04-21 14:14:10 -04:00
Yiannis Papadopoulos 7c8fa87160 rocr/aie: Remove redundant cache flushes for already loaded PDIs 2025-04-17 09:48:41 -05:00
Shane Xiao 6a63170b38 rocr: Add rec sdma engines with limited XGMI SDMA engine
This patch will adds recommended sdma supports with
limited XGMI SDMA engine. It will use one PCIe SDMA
to do gpu <-> gpu copies which will help improve all
to all copy performance.

Signed-off-by: Shane Xiao <shane.xiao@amd.com>
2025-04-11 23:54:15 +08:00
Jonathan Kim 4c3a0698f8 kfdtest: fix trap on start for gfx 9 and 11
Similar to GFX 12, GFX 9 and 11 need to exit without forwarding
the PC.
2025-04-10 14:48:19 -04:00
David Yat Sin c1b7aa39ed rocr: refactor PC Sampling PRED_EXEC op
Refactor PRED_EXEC op command size calculation.
Fix issue when copy size is less than 32MB.
2025-04-08 17:26:29 -04:00
Yiannis Papadopoulos 2d2c47bdef rocr/aie: Increment write pointer upon packet submission 2025-04-08 15:36:40 -05:00
Eric Huang df6048429c kfdtest: fix max queues on multi-gpu mode
The max queues per process is 1024 in KFD,
KFDQMTest.OverSubscribeCpQueues fails with multi-gpu mode
on more than 15 gpus, because 65x16=1040 exceeds 1024, so
changing MAX_CP_QUEUES to adapt it will fix the issue.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
2025-04-08 12:57:00 -04:00
Eric Huang d3265234e9 kfdtest: fix ptrace error on multi-gpu mode
The parent process can only be ptraced by 1 process
once, to avoid the error we have to add mutex to
synchronize the ptrace call.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
2025-04-08 09:58:28 -04:00
Choudhary, Rahul 5b4c717208 Update workflow to use mainline branch 2025-04-07 09:36:52 -04:00
Choudhary, Rahul a0b80c825c Update rocm_ci_caller.yml updating push trigger to amd-mainline
Signed-off-by: Choudhary, Rahul <Rahul.Choudhary@amd.com>
2025-04-07 09:36:52 -04:00
Yiannis Papadopoulos c63e01724c rocr/aie: Using PDI address instead of cu_mask for dispatch. Automatic hw ctx reconfiguration upon new PDI addition. 2025-04-03 15:13:20 -05:00
Lancelot SIX e0359e5d35 rocr: Replace tabs with spaces in trap handler source codes
Use spaces consistently to format the trap handler code.  This patch
does not introduce any change in the trap handler.  Using `git show -w`
on this patch shows an empty diff.

Change-Id: Ic0244dd203347146ffde65460cd87ecbcc43732a
2025-04-03 09:44:23 +01:00
David Yat Sin 2a433e2b96 rocr: Fix PC Sampling PRED_EXEC num dwords count
Fix incorrect value for number of dwords in the PRED_EXEC command.
2025-04-01 15:53:45 -04:00
Mallya, Ameya Keshava 39e8911fbc Adding !verify features
Signed-off-by: Mallya, Ameya Keshava <AmeyaKeshava.Mallya@amd.com>
2025-03-31 13:05:52 -07:00
Lancelot SIX 6a4785f650 Fix Stochastic sampling trap handler
The trap handler should read the PERF_SNAPSHOT_DATA after all of
PERF_SNAPSHOT_DATA, PERF_SNAPSHOT_PC_LO and PERF_SNAPSHOT_PC_HI.  This
patch fixes this.

Change-Id: I7f78e16d7a0d8bfebb34906b4dff73c2eaeb5658
2025-03-31 10:20:19 +01:00
Lancelot SIX eece210a5c trap_handler.s: Clear PERF_SNAPSHOT/HOST_TRAP before returning
Make sure to clear the HOST_TRAP and PERF_SNAPSHOT bits before returning
from the second level trap handler.  As those bits are sticky, this
ensures future re-entry to the trap handler (for context save for
example) will not be confused with a sampling trap.

Change-Id: I05e5e58779a650b324ac6e30d574dc6931340f13
Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
2025-03-31 10:20:19 +01:00
Mallya, Ameya Keshava 05c81b6855 Added KWS check for amd-mainline
Signed-off-by: Mallya, Ameya Keshava <AmeyaKeshava.Mallya@amd.com>
2025-03-28 08:27:05 -07:00
Apurv Mishra 10530fa2a7 kfdtest: support for upstream kernel driver
detect if the loaded driver is upstream or DKMS version and
add a filter for for the tests that fail in upstream driver

Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>
2025-03-27 16:55:21 -04:00
Yiannis Papadopoulos 0bd4acb5d4 rocr/aie: Returning error code if query not recognized 2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos e55503e7f8 rocr/aie: Bundling XDNA BOs and addresses, adding cleanup guard in case of error 2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos f4e1c9b0ba rocr/aie: Avoiding XdnaDriver class in queue API 2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos 8dcbbf31c7 rocr/aie: Remove unused struct from HSA API 2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos bf8ab493c4 rocr: Remove unused lambda 2025-03-27 10:33:40 -04:00
Yiannis Papadopoulos b066e0eefa rocr/aie: Resolve parentheses warning 2025-03-27 10:33:40 -04:00
David Yat Sin 947391deac rocr: Release agent resources before pools
Adding a general stage for agents to release their resources on
shutdown. This avoids a circular dependency during shutdown because
we have to delete allocated resources before deleting memory pools, but
we also have to delete memory pools before destroying agents.
2025-03-25 14:25:04 -04:00
Yiannis Papadopoulos a66130bc48 rocr: Release vmem handles before agent destruction 2025-03-25 14:25:04 -04:00
Yiannis Papadopoulos 765563b786 rocr: Return success status in IsModelEnabled() 2025-03-25 10:05:16 -04:00
lyndonli c34a2798ce rocr: Remove redundant Refresh() call
The initial call to Refresh() in the constructor is
unnecessary as it's handled in Runtime::Load().

Signed-off-by: lyndonli <Lyndon.Li@amd.com>
2025-03-25 09:13:59 -04:00
Jonathan Kim c710a06ee0 kfdtest: fix trap on wave start and end
The debugger override will set the initial request mask to the
previously set request mask so use a different mask to assert
enablement.
Trap on wave start and end also run back to back, so fix the
previous override mask check as well.

In addition, unlike instruction traps, trap on wave start and end
will not require a rewind of the program counter on wave exit.
2025-03-24 20:44:27 -04:00
Adel Johar d8d27d4fd6 Docs: Add more variables to env_variables.rst 2025-03-20 11:59:58 -04:00
Lang Yu 89926f5b0b rocrtst: fix rocrtst.Test_Example
VerifyResult always returns true. That's not expected.

Signed-off-by: Lang Yu <lang.yu@amd.com>
2025-03-20 12:57:52 +08:00
Shweta Khatri 2ae70735e8 rocr: Fix PcSamplingCreateFromId to pass 32-bit dword count to DmaFill
In PcSamplingCreateFromId, convert number of bytes into number of
dwords because DmaFill expects a count of 32-bit words, not raw bytes.
This prevents OOB writes on large sampling buffers.
2025-03-19 14:42:41 -04:00
Lao, Darren cd4d236185 rocr: Change ISA grid dimensions
Signed-off-by: Lao, Darren <Darren.Lao@amd.com>
2025-03-19 13:44:17 -04:00
Tim Gu 0a28e0a54a Update build instructions 2025-03-18 19:54:20 -04:00
randyh62 e2f3e8c0de fix license include path 2025-03-18 16:29:10 -04:00
David Yat Sin ce0244ac03 Revert rocr: Only expose ext-fine-grain pool on xgmi-hive systems
This reverts commit 6dac90c89a.
2025-03-18 16:28:36 -04:00
jordans d4b85b6bf5 hsakmt: Initial Commit for the HSA KMT Model
The over arching goal it so provide an API that pre-silicon models can latch into for software bring up.# Please enter the commit message for your changes. Lines starting
2025-03-18 16:22:17 -04:00
David Yat Sin 6903a41b1d rocr: Workaround for SDMA POLL_REGMEM on gfx9.0
Poll the dependent signals twice on all gfx9.0 GPUs except gfx90a.
This is needed as a work-around for a rare issue where SDMA_POLL_REGMEM
may return before the memory is actually cleared.
2025-03-17 17:59:15 -04:00
Mallya, Ameya Keshava 5d254c6fb0 Added release trigger for further releases
Signed-off-by: Mallya, Ameya Keshava <AmeyaKeshava.Mallya@amd.com>
2025-03-14 13:52:00 -07:00
Stella Laurenzo c36ccaaf4b rocr: Search for libnuma with find_package before find_library.
This avoids a false dependence on a system library when not desired.
2025-03-14 08:16:13 -07:00
Hila, Nino 98a5ebc3f1 Update palamida.yml
Signed-off-by: Hila, Nino <Nino.Hila@amd.com>
2025-03-13 20:08:56 -04:00