rocm-systems

Tekijä	SHA1	Viesti	Päivämäärä
Searles, Mark	ac1e6d59c2	Update createMCObjectStreamer() to use new LLVM API (#156 ) (#157 ) * Update createMCObjectStreamer() to use new LLVM API Obsolete interfaces were removed via llvm-project's f2ff298867d7733122e32eead5a8c524b09dfdb1 * Fix typo: LLVM_VERSION -> LLVM_VERSION_MAJOR * Fix typo	2025-05-05 13:18:05 -07:00
Apurv Mishra	aa0a32a166	kfdtest: Update ROCr homepage in CMakeLists.txt Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>	2025-05-01 11:22:49 -04:00
David Yat Sin	4ed5950beb	rocr: Fix logic for scratch reclaim Fix logic error that can cause scratch memory to be reclaimed while a dispatch is still using it.	2025-04-29 17:23:45 -04:00
Amber Lin	5e28208cec	kfdtest: Skip SVMEvict with xnack=0 Random driver deadlock on svm_range_evict_svm_bo_worker() is obeserved on NPS2/DPX mode. It's seen with xnack off and happens more often on the partition with less VRAM because of TMR. Temporarily skip SVM Evict tests on Family AV when xnack is disabled. Signed-off-by: Amber Lin <Amber.Lin@amd.com>	2025-04-25 12:45:36 -04:00
Tony Gutierrez	f2c482d923	rocr: Add large_bar_enabled var to the GPU agent Adds a bool to the GPU agent and a public member method to check if the GPU supports large BAR. This is needed so we can check if large BAR is supported when a user tries to allocate an AQL queue in device memory on a given GPU agent. Also adds an exception to the AQL queue if device-side AQL queues are requested and the GPU owner of the AQL doesn't support large BAR. Otherwise, ROCr will currently allow device-side queues that can cause faults when the user tries to touch their ring buffers and the user will not know why the faults are occuring. This relies on the fact that the KFD does not exposed any links from the CPU to the GPU if large BAR is not enabled (though links from the GPU to the CPU may still be exposed by the KFD).	2025-04-23 15:53:29 -04:00
Tony Gutierrez	6e3c375bf1	rocr: Flags to alloc queue buf/struct in dev mem This builds on a prior change that allowed for allocating a user-mode queue's packet buffer in device memory to also allocate the queue struct in device memory. This provides additional latency benefits particularly for cases where dispatches are performed from the GPU itself. Flags are added to support the various use cases.	2025-04-23 15:53:29 -04:00
Tony Gutierrez	11d1d2cd25	rocr: Remove empty shared.cpp	2025-04-23 15:53:29 -04:00
Tony Gutierrez	adbc0495e2	rocr/libhsakmt: Add coarse-grain allocator to GPU	2025-04-23 15:53:29 -04:00
Saleel Kudchadker	57c0c643ce	rocr: return preferred SDMA engine mask - Add a new AMD extension API to return preferred SDMA engine mask. This can use used in conjunction with copy_on_engine API to get optimal bandwidth.	2025-04-22 13:28:38 -07:00
Amber Lin	bdb6e43b54	Revert "kfdtest: Temporarily blacklist KFDNegativeTest" This reverts commit `fcf3f91379`. MEC v18 starts to support pipe reset	2025-04-21 14:14:10 -04:00
Yiannis Papadopoulos	7c8fa87160	rocr/aie: Remove redundant cache flushes for already loaded PDIs	2025-04-17 09:48:41 -05:00
Shane Xiao	6a63170b38	rocr: Add rec sdma engines with limited XGMI SDMA engine This patch will adds recommended sdma supports with limited XGMI SDMA engine. It will use one PCIe SDMA to do gpu <-> gpu copies which will help improve all to all copy performance. Signed-off-by: Shane Xiao <shane.xiao@amd.com>	2025-04-11 23:54:15 +08:00
Jonathan Kim	4c3a0698f8	kfdtest: fix trap on start for gfx 9 and 11 Similar to GFX 12, GFX 9 and 11 need to exit without forwarding the PC.	2025-04-10 14:48:19 -04:00
David Yat Sin	c1b7aa39ed	rocr: refactor PC Sampling PRED_EXEC op Refactor PRED_EXEC op command size calculation. Fix issue when copy size is less than 32MB.	2025-04-08 17:26:29 -04:00
Yiannis Papadopoulos	2d2c47bdef	rocr/aie: Increment write pointer upon packet submission	2025-04-08 15:36:40 -05:00
Eric Huang	df6048429c	kfdtest: fix max queues on multi-gpu mode The max queues per process is 1024 in KFD, KFDQMTest.OverSubscribeCpQueues fails with multi-gpu mode on more than 15 gpus, because 65x16=1040 exceeds 1024, so changing MAX_CP_QUEUES to adapt it will fix the issue. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>	2025-04-08 12:57:00 -04:00
Eric Huang	d3265234e9	kfdtest: fix ptrace error on multi-gpu mode The parent process can only be ptraced by 1 process once, to avoid the error we have to add mutex to synchronize the ptrace call. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>	2025-04-08 09:58:28 -04:00
Choudhary, Rahul	5b4c717208	Update workflow to use mainline branch	2025-04-07 09:36:52 -04:00
Choudhary, Rahul	a0b80c825c	Update rocm_ci_caller.yml updating push trigger to amd-mainline Signed-off-by: Choudhary, Rahul <Rahul.Choudhary@amd.com>	2025-04-07 09:36:52 -04:00
Yiannis Papadopoulos	c63e01724c	rocr/aie: Using PDI address instead of cu_mask for dispatch. Automatic hw ctx reconfiguration upon new PDI addition.	2025-04-03 15:13:20 -05:00
Lancelot SIX	e0359e5d35	rocr: Replace tabs with spaces in trap handler source codes Use spaces consistently to format the trap handler code. This patch does not introduce any change in the trap handler. Using `git show -w` on this patch shows an empty diff. Change-Id: Ic0244dd203347146ffde65460cd87ecbcc43732a	2025-04-03 09:44:23 +01:00
David Yat Sin	2a433e2b96	rocr: Fix PC Sampling PRED_EXEC num dwords count Fix incorrect value for number of dwords in the PRED_EXEC command.	2025-04-01 15:53:45 -04:00
Mallya, Ameya Keshava	39e8911fbc	Adding !verify features Signed-off-by: Mallya, Ameya Keshava <AmeyaKeshava.Mallya@amd.com>	2025-03-31 13:05:52 -07:00
Lancelot SIX	6a4785f650	Fix Stochastic sampling trap handler The trap handler should read the PERF_SNAPSHOT_DATA after all of PERF_SNAPSHOT_DATA, PERF_SNAPSHOT_PC_LO and PERF_SNAPSHOT_PC_HI. This patch fixes this. Change-Id: I7f78e16d7a0d8bfebb34906b4dff73c2eaeb5658	2025-03-31 10:20:19 +01:00
Lancelot SIX	eece210a5c	trap_handler.s: Clear PERF_SNAPSHOT/HOST_TRAP before returning Make sure to clear the HOST_TRAP and PERF_SNAPSHOT bits before returning from the second level trap handler. As those bits are sticky, this ensures future re-entry to the trap handler (for context save for example) will not be confused with a sampling trap. Change-Id: I05e5e58779a650b324ac6e30d574dc6931340f13 Signed-off-by: Lancelot SIX <lancelot.six@amd.com>	2025-03-31 10:20:19 +01:00
Mallya, Ameya Keshava	05c81b6855	Added KWS check for amd-mainline Signed-off-by: Mallya, Ameya Keshava <AmeyaKeshava.Mallya@amd.com>	2025-03-28 08:27:05 -07:00
Apurv Mishra	10530fa2a7	kfdtest: support for upstream kernel driver detect if the loaded driver is upstream or DKMS version and add a filter for for the tests that fail in upstream driver Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>	2025-03-27 16:55:21 -04:00
Yiannis Papadopoulos	0bd4acb5d4	rocr/aie: Returning error code if query not recognized	2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos	e55503e7f8	rocr/aie: Bundling XDNA BOs and addresses, adding cleanup guard in case of error	2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos	f4e1c9b0ba	rocr/aie: Avoiding XdnaDriver class in queue API	2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos	8dcbbf31c7	rocr/aie: Remove unused struct from HSA API	2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos	bf8ab493c4	rocr: Remove unused lambda	2025-03-27 10:33:40 -04:00
Yiannis Papadopoulos	b066e0eefa	rocr/aie: Resolve parentheses warning	2025-03-27 10:33:40 -04:00
David Yat Sin	947391deac	rocr: Release agent resources before pools Adding a general stage for agents to release their resources on shutdown. This avoids a circular dependency during shutdown because we have to delete allocated resources before deleting memory pools, but we also have to delete memory pools before destroying agents.	2025-03-25 14:25:04 -04:00
Yiannis Papadopoulos	a66130bc48	rocr: Release vmem handles before agent destruction	2025-03-25 14:25:04 -04:00
Yiannis Papadopoulos	765563b786	rocr: Return success status in IsModelEnabled()	2025-03-25 10:05:16 -04:00
lyndonli	c34a2798ce	rocr: Remove redundant Refresh() call The initial call to Refresh() in the constructor is unnecessary as it's handled in Runtime::Load(). Signed-off-by: lyndonli <Lyndon.Li@amd.com>	2025-03-25 09:13:59 -04:00
Jonathan Kim	c710a06ee0	kfdtest: fix trap on wave start and end The debugger override will set the initial request mask to the previously set request mask so use a different mask to assert enablement. Trap on wave start and end also run back to back, so fix the previous override mask check as well. In addition, unlike instruction traps, trap on wave start and end will not require a rewind of the program counter on wave exit.	2025-03-24 20:44:27 -04:00
Adel Johar	d8d27d4fd6	Docs: Add more variables to env_variables.rst	2025-03-20 11:59:58 -04:00
Lang Yu	89926f5b0b	rocrtst: fix rocrtst.Test_Example VerifyResult always returns true. That's not expected. Signed-off-by: Lang Yu <lang.yu@amd.com>	2025-03-20 12:57:52 +08:00
Shweta Khatri	2ae70735e8	rocr: Fix PcSamplingCreateFromId to pass 32-bit dword count to DmaFill In PcSamplingCreateFromId, convert number of bytes into number of dwords because DmaFill expects a count of 32-bit words, not raw bytes. This prevents OOB writes on large sampling buffers.	2025-03-19 14:42:41 -04:00
Lao, Darren	cd4d236185	rocr: Change ISA grid dimensions Signed-off-by: Lao, Darren <Darren.Lao@amd.com>	2025-03-19 13:44:17 -04:00
Tim Gu	0a28e0a54a	Update build instructions	2025-03-18 19:54:20 -04:00
randyh62	e2f3e8c0de	fix license include path	2025-03-18 16:29:10 -04:00
David Yat Sin	ce0244ac03	Revert rocr: Only expose ext-fine-grain pool on xgmi-hive systems This reverts commit `6dac90c89a`.	2025-03-18 16:28:36 -04:00
jordans	d4b85b6bf5	hsakmt: Initial Commit for the HSA KMT Model The over arching goal it so provide an API that pre-silicon models can latch into for software bring up.# Please enter the commit message for your changes. Lines starting	2025-03-18 16:22:17 -04:00
David Yat Sin	6903a41b1d	rocr: Workaround for SDMA POLL_REGMEM on gfx9.0 Poll the dependent signals twice on all gfx9.0 GPUs except gfx90a. This is needed as a work-around for a rare issue where SDMA_POLL_REGMEM may return before the memory is actually cleared.	2025-03-17 17:59:15 -04:00
Mallya, Ameya Keshava	5d254c6fb0	Added release trigger for further releases Signed-off-by: Mallya, Ameya Keshava <AmeyaKeshava.Mallya@amd.com>	2025-03-14 13:52:00 -07:00
Stella Laurenzo	c36ccaaf4b	rocr: Search for libnuma with find_package before find_library. This avoids a false dependence on a system library when not desired.	2025-03-14 08:16:13 -07:00
Hila, Nino	98a5ebc3f1	Update palamida.yml Signed-off-by: Hila, Nino <Nino.Hila@amd.com>	2025-03-13 20:08:56 -04:00

1 2 3 4 5 ...

2853 Commitit