نمودار کامیت

3030 کامیت‌ها

مولف SHA1 پیام تاریخ
German Andryeyev ee1158b7b8 rocr: Fix Windows build and Ctz implementation (#1634) 2025-11-03 12:07:11 -05:00
systems-assistant[bot] 740b27528f kfdtest: Enable GPU selection via CLI for multi-GPU tests (#245)
* kfdtest: Enable GPU selection via CLI for multi-GPU tests

Replaced environment variable-based GPU selection with
GPU selection via command-line parameter --concurrentnodes (-c)
Modified g_TestGPUsNum to be passed in via command-line
parameter --testnodenum (t)

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>

* kfdtest: Enable GPU selection via CLI for multi-GPU tests
Replaced environment variable-based GPU selection with
GPU selection via command-line parameter --concurrentnodes (-c)
Modified g_TestGPUsNum to be passed in via command-line
parameter --testnodenum (t)

---------

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
Co-authored-by: Alysa Liu <Alysa.Liu@amd.com>
2025-11-03 09:27:38 -05:00
arvindcheru fb1d32c15c SWDEV-530465 Update share/doc/<pkgnm> License Folder for hsa-rocr (#923)
* SWDEV-530465 Update share/doc/<pkgnm> License Folder for hsa-rocr
* Review Comments Updated - reverted to usage of DOCDIR
2025-10-31 23:21:22 -04:00
Yiannis Papadopoulos 37bbc9062a rocr/aie: Detect AIE architecture and marketing name (#1459)
* rocr/aie: Detect AIE architecture and marketing name

* rocr/aie: Modernize code, update comments
2025-10-31 09:10:18 -05:00
Yiannis Papadopoulos 82d68fc772 rocrtst: Assume that AIE agent memory is system RAM (#1231) 2025-10-31 09:10:00 -05:00
David Yat Sin 6497fa0339 rocr: Fix wrong args in memory copy functions (#1520)
Fix incorrect arguments passed into system_region->Lock
2025-10-27 14:12:06 -05:00
David Yat Sin f7b180ee7d rocr: SW workaround for gfx90x SDMA poll (#1469)
Workaround for rare issue on gfx90x asics when SDMA_OP_POLLREGMEM
returns before polled memory has value of 0.
Removing previous SW workaround to double-poll as it was not reliable.
2025-10-27 09:33:20 -04:00
David Yat Sin db01d95ebc Users/dayatsin/swdev 519413 hsa amd pointer info return err shutdown (#1509)
* rocr: hsa_amd_pointer_info return err on shutdown

Decrement ref count before starting to unload to make sure API
calls during shutdown return error.

Delete blit objects during agent destructor.

* Add support for HSA_AMD_SYSTEM_SHUTDOWN_EVENT

Add support for new event to indicate shut down within the
hsa_amd_register_system_event_handler API.
2025-10-27 09:32:52 -04:00
Rahul Manocha 4f075902fc SWDEV-555347 - Remove lock contention in async events loop (#878)
* SWDEV-555347 - Remove lock contention in async events loop

* SWDEV-555347 - Introduce Pool of AsyncEventItems

* create generic mempool for AsyncEventItem

* Use BaseShared allocate and free for async event pool

---------

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2025-10-24 08:43:00 -07:00
pghoshamd 95f721f8a5 Check emulator mode at runtime (#1432)
* Check emulator mode at runtime

* Reduce emu mode function call to one time and use result

* Move function to main.cc

* Address feedback

* EmuMode check improvement; convert to AoS

* replace g_isEmuMode with func call

* Add mode check func for every sample
2025-10-24 10:11:19 -04:00
systems-assistant[bot] bebe65f104 rocr: fix nullptr dereference (#262)
* rocr: fix nullptr dereference

Return early in the case that malloc fails to avoid dereferencing of a
null pointer on eventDescrp.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* rocr: Fix potential nullptr dereference

returns early if sym->section() fails to properly acquire the object.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

---------

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Co-authored-by: Sunday Clement <Sunday.Clement@amd.com>
2025-10-21 13:49:01 -04:00
David Yat Sin e2f3bd2429 Changes for RDMA with VMM (#801)
* rocr: Add support for VMM and RDMA

Add extra CPU mapping so that kernel-mode drivers can look up the memory
mapping by virtual address.

* Update projects/rocr-runtime/runtime/hsa-runtime/core/runtime/runtime.cpp

Co-authored-by: Yiannis Papadopoulos <102817138+ypapadop-amd@users.noreply.github.com>

* Update projects/rocr-runtime/runtime/hsa-runtime/core/inc/runtime.h

Co-authored-by: Yiannis Papadopoulos <102817138+ypapadop-amd@users.noreply.github.com>

* rocr: Honor uncache flag in memory_lock_to_pool()

Also, combined several flag options used in apis into a
single integer.

Signed-off-by: Chris Freehill <cfreehil@amd.com>

* rocr: Fix hsa_amd_pointer_info on CPU agents

Fix hsa_amd_pointer_info query returning allowd on VMM pointers for CPU
agents when CPU mapping was mapped with PROT_NONE.

---------

Signed-off-by: Chris Freehill <cfreehil@amd.com>
Co-authored-by: Yiannis Papadopoulos <102817138+ypapadop-amd@users.noreply.github.com>
Co-authored-by: Chris Freehill <cfreehil@amd.com>
Co-authored-by: cfreeamd <166262151+cfreeamd@users.noreply.github.com>
2025-10-21 12:19:02 -04:00
randyh62 fd5ad25615 Add note for setting the HSA_SCRATCH_SINGLE_LIMIT (#1391) 2025-10-19 17:38:06 -07:00
cfreeamd 911a2f42c1 Revert "rocr: Don't assert in hsa_shut_down when no agents (#1115)" (#1312)
This reverts commit fb8ab442b6.
2025-10-17 08:36:06 -07:00
David Bélanger 02294e3852 kfdtest: Fix ExtendedCuMasking on GPUs with inactive CUs (#726)
Modify the code that computes the adjusted CU mask array to take
into account of additional cases for inactive CUs.

Signed-off-by: David Belanger <david.belanger@amd.com>
2025-10-17 08:26:12 -07:00
cfreeamd 9df655088f thunk: Correct kfd_ioctl_create_queue_args comment (#1235) 2025-10-17 08:25:51 -07:00
Sunday Clement b9b8b6110b rocrtst: Add SVM Prefetch test (#360)
this test will prefetch SVM memory, and then verify the memory is sourced
from the expected numa node.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2025-10-17 09:43:46 -04:00
Sunday Clement c23c320b4d rocr: Make IPC Handles Unique (#795)
Query IPC handles on shared memory export/import for any metadata as a
means to uniquely identify handles that happen to be backed by buffers
that point to the same memory.
2025-10-16 14:37:02 +05:30
Alysa Liu 4342579645 libhsakmt: Fix memory leak for events_page metadata (#807) 2025-10-15 14:52:40 -04:00
Alysa Liu d5cbdc104d rocrtst: Add Memory_Async_Copy_On_Engine Test (#885)
Increase test coverage involving:
hsa_amd_memory_get_preferred_copy_engine()
hsa_amd_memory_copy_engine_status()
hsa_amd_memory_async_copy_on_engine()
2025-10-15 14:51:54 -04:00
axie_amdeng dde482d224 rocr: unitialized size variable caused huge memory/space allocation (#1232)
Signed-off-by: Alex Xie <AlexBin.Xie@amd.com>
2025-10-14 16:57:10 -04:00
David Yat Sin 7f79d0febc rocr: Set signal memory allocations to NonPaged (#1219)
Set memory allocation to non-paged to avoid issues caused when CP tries
to access signals after page has been migrated.
2025-10-10 17:35:15 -04:00
David Yat Sin 7f2ef6a602 rocr: Return error on signal alloc failure (#1310)
Return HSA_STATUS_ERROR_OUT_OF_RESOURCES when signal allocation fails.
2025-10-10 14:06:31 -04:00
German Andryeyev 7ca2497378 rocr: Add AQL queue support under Windows (#1211)
Add 2 extra caps into the thunk interface to indicate
the queue object creation and PM4 emulation
2025-10-07 17:55:08 -04:00
cfreeamd fb8ab442b6 rocr: Don't assert in hsa_shut_down when no agents (#1115)
* rocr: Don't assert in hsa_shut_down when no agents

Instead, print error message and return an error. Prior to
this patch, the assertion would occur when hsa_shut_down() is
called more than once.

* rocr: Reorder Unload  ASAN clean-up on shut down
2025-10-02 17:20:53 -07:00
cfreeamd 402aa7e253 rocr: Support batching in InterceptQueue store (#1194)
* rocr: Support batching in InterceptQueue store

* Fix comment, loop bounds
2025-10-02 10:37:40 -07:00
cfreeamd 55feeefcff Revert "rocr: Remove QueueProxy (#700)" (#1167)
This reverts commit c34c9826c3,
which was causing test failures.
2025-10-01 18:24:43 -07:00
David Yat Sin cd48105282 rocr: Fix ext-fine-grain flag on host memory (#1067)
Fix for extended-fine-grain flag not set in thunk when
allocating host memory.
2025-09-25 11:10:43 -04:00
Sunday Clement f3e1db176a rocrtst: Reduce host memory limit to 70% (#905)
* rocrtst: Reduce host memory limit to 70%

Reducing the upper bound for rocrtstFunc.Memory_Max_Mem to 70% from
90% to help reduce test execution time.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* rocrtst: Add ROCRTST_LIMIT_POOL_SIZE env var

Add environment variable to override the memory pool sizes when running
tests.

Co-authored-by: David Yat Sin <David.YatSin@amd.com>

---------

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Co-authored-by: David Yat Sin <David.YatSin@amd.com>
2025-09-22 09:39:00 -04:00
hkasivis 5e7210980e Users/hkasivis/add ais support v2.1 (#928)
* libhsakmt: Update hsakmt_fmm_get_handle to support address range

Currently, hsakmt_fmm_get_handle works only if the address is allocated
(staring) value. Update it so it can find the handle if address falls in
the valid allocated range. This is useful for AMD infinity storage
feature where data needs to be transferred to any memory within in the
allocated range

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

* libhsakmt: Introduce AMD Infinity Storage (AIS) API

Add hsaKmtAisReadWriteFile() API to support AMD Infinity Storage. The
API moves data directly from GPU VRAM to a file.

v2: Add in/out ioctl arguments to provide more status information to
user space. Modify hsaKmt API also accordingly.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

* rocr: Initial implementation of AMD Infinity Storage (AIS)

Implement first two API: hsa_amd_ais_file_write and hsa_amd_ais_file_read

v2: Change API from hsa_amd_ to hsa_amd_ais_
    Change API to take in handle instead of fd for compatibility accross
     different platforms

Original Author: Chris Freehill <Chris.Freehill@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

---------

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2025-09-20 11:30:05 -04:00
Tony G c34c9826c3 rocr: Remove QueueProxy (#700)
Because the base QueueWrapper class copies the wrapped queue's
amd_queue_v2_t queue descriptor struct the QueueProxy seems
superfluous as it will have the same effect as calling the
underlying methods on the wrapped queue itself.

Additionally, because the QueueProxy needs to access the wrapped
queue's queue descriptor it breaks the Queue API which is meant
to abstract the underlying agent's queue implementation.

This makes it easier to generalize the core::Queue as well as
the InterceptQueue.

Signed-off-by: Tony Gutierrez <anthony.gutierrez@amd.com>
2025-09-19 09:07:28 -07:00
German Andryeyev 913743d433 Add windows build support into ROCr (#912)
Make sure ROCR can be compiled under windows. Extra setup for the windows build environment is required. The change should not have any functional changes under Linux.
2025-09-19 10:10:17 -04:00
David Yat Sin 96a0d16eda rocr: Fix hsa_amd_pointer_info regression (#719)
Fix for hsa_amd_pointer_info returning only
HSA_EXT_POINTER_TYPE_RESERVED_ADDR for SVM allocations.
2025-09-19 10:09:22 -04:00
Sunday Clement 7c8e575f5d Fix Undefined behavior from signed bit shifts (#871)
* libhsakmt: fix UB due to signed integer literal in 1 << 31

Bit shift operations on signed numbers should not shift into or beyond
the signed bit as this results in Undefined Behaviour.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* libhsakmt: Fix UB due to signed integer literal in 1 << x

Bit Shifting an unsigned integer is undefined behavior.

BUG: SWDEV-532853

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* rocr: Fix UB in various places due signed integer in bit shift

Bit shifting signed integers into or beyond the sign bit is undefined.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* rocr: Change signed integer literals to unsigned

Changing the signed integers in the macro expressions throughout the file
to avoid overflow.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

---------

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Co-authored-by: Flora Cui <flora.cui@amd.com>
2025-09-18 09:09:30 -04:00
Sunday Clement db63d4c38b hsakmt: Update udmabuf.h License Identifier Header (#873)
Fix typos, and update the license header to include SPDX license
identifier.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2025-09-16 10:36:02 -04:00
systems-assistant[bot] f1fabcfd64 rocr: Error Handling Issues (#264)
* rocr: Fix Incorrect Assertion Check

The wrong variable is used in the assertion statement, should be error
checking for the value of paramEndLoc after it is modified by the call
to find().

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* rocr: Fix Potential Undefined Behaviour

In the event that the SvmProfileControl destructor is called and
event == -1 is true then the call to close(event) is effectively
close(-1) which is undefined behaviour. This has been changed to only
call close() on valid file descriptors.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* rocr: Add Error Check on Bytes Read

In the case that there is an incomplete read the call to copyTo() will
now return an error.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* rocr: Fix Exception Error

Destructors are implicitly marked with noexcept being true by default
so if its not explicitly marked false in the destructor or the
functions it calls, any thrown exceptions will cause the program to
crash.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

---------

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Co-authored-by: Sunday Clement <Sunday.Clement@amd.com>
2025-09-16 09:43:45 -04:00
Alysa Liu 2b2b8329b5 rocr: Add copyright for new files (#886)
Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-09-11 10:56:31 -04:00
Benjamin Welton ed5b2ac165 Fix deadlock in InterceptQueue::Submit when packet count exceeds queue capacity (#855)
InterceptQueue::Submit had an "all-or-nothing" packet submission policy that
could cause infinite retry loops when the number of packets to submit exceeded
the available queue slots. When 504+ packets needed submission to a ~500-slot
queue, the system would:
1. Set submitted_count=0 (submit nothing)
2. Add retry barrier packet
3. Trigger async handler via StoreRelaxed
4. Attempt to submit overflow packets
5. Fail again due to same space constraints
6. Repeat

Solution:
Added partial packet submission capability during overflow processing while
preserving the original "all-or-nothing" behavior for normal operations.
When processing overflow packets and insufficient space exists for all packets,
the system now submits as many packets as possible rather than none.

The fix:
- Detects overflow processing via !overflow_.empty()
- Allows partial submission: submitted_count = free_slots - barrier_reservation
- Maintains atomicity guarantees for normal packet rewrites
- Prevents infinite retry loops by ensuring forward progress

This resolves deadlocks in high-throughput scenarios while maintaining
backward compatibility and the original design intent for packet rewrite
atomicity.
2025-09-09 14:06:29 -07:00
Sunday Clement e9bb77614e rocrtst: Test for shader access after async_copy (#645)
New test that does a memory_copy, and right after has the shader access
the data. This verifies that the memory is coherent and that all the
probes and flushes were done correctly by the memory_copy.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2025-09-09 15:03:56 -04:00
Flora Cui e7cb108a5e [rocr-runtime] Add support for WSL DXG devices (#854)
* rocr/rocdxg: add rocdxg support

* rocr/dxg: set flags for dxg env

* rocr: ring doorbell for dtif/dxg

* rocr/dxg: sdma changes

1. align command size to 64
2. call hsaKmtQueueRingDoorbell
3. disable gcr && hdp flush


Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Signed-off-by: tiancyin <tianci.yin@amd.com>
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2025-09-09 10:16:57 +08:00
hkasivis a5713c85bb Users/hkasivis/sync kfd ioctl header (#848)
* libhsakmt: Update ioctl version to 1.18

Sync with kernel ioctl version.

Also explicitly set the ioctl flag to KFD_PROC_FLAG_MFMA_HIGH_PRECISION

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

* libhsakmt: Sync ioctl header by adding kfd_ioctl_profiler

Sync with kernel ioctl version. Add kfd_ioctl_profiler.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

---------

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2025-09-07 20:04:31 -04:00
estewart08 bc35beafbf rocr: Remove extra LibElf find_package (#767)
This should have been removed when the libelf config search
was added.
2025-09-03 20:04:05 -04:00
systems-assistant[bot] 83a10986a4 SWDEV-539130 - Log blit copy duration (#258)
Co-authored-by: Pengda Xie <pengda.xie@amd.com>
2025-09-03 10:01:47 -07:00
SaleelK 230a22b395 rocr: Workaround for peak SDMA b/w on gfx94x (#626)
* Ideally SDMA0/1/2 are the engines to use for H2D/D2H due to physical
  PCIE proximity
* Allow using same src/dst agent for SDMA query apis
2025-09-03 09:33:29 -04:00
jonatluu 6bc1ea966f fix lintian warning (#696)
* fix lintian warning

* fix lintian warning
2025-08-27 13:53:54 -04:00
shwetakhatri-amd 79400a1f23 rocr: GFX12+ - Fix trap handler to process SW trap ID correctly (#736)
When stochastic sampling is not active, the trap handler is incorrectly
branching to .check_exceptions, bypassing the software trap ID checks
and inturn not advancing the PC. Fixed the issue to always check software
traps regardless of PC sampling state.

Co-authored-by: Shweta Khatri <shweta.khatri@amd.com>
2025-08-25 19:20:37 -04:00
cfreeamd a013e141b7 Revert "rocr: river interface changes" (#724)
This commit reverts the following related commits which cause
test failures:

6d15779b3e rocr/driver: add PC sampling support to driver interface
56cb9390ff rocr/driver: add PC sampling support to driver interface
76bf829f09 rocr/driver: add ASAN header page management to Driver class
a47c060d6a rocr/driver: add ASAN header page management to Driver class
02d7eaf3b7 rocr: add memory sharing call to Driver interface
9312468655 rocr: add memory sharing call to Driver interface
2025-08-25 12:44:26 +05:30
David Yat Sin a1597a358a rocr: Expose flag to allocate uncached memory (#674)
Add new flag for clients to directly request uncached memory
2025-08-22 09:52:39 -04:00
David Yat Sin 87b348c51d rocr: Fix hsa_amd_pointer_info regression (#638)
Fix regression when hsa_amd_pointer_info is called on a pointer that was
allocated using non-VMM APIs. The helper function VMemoryPtrInfo should
return error when the address is not found so that PtrInfo does the
lookup via Thunk.
2025-08-21 10:25:50 -04:00
hkasivis 53ba025a2e libhsakmt: Don't use MADV_DONTFORK for paged memory (#356)
Also advice parameter of madvise() system call is not a bitmask. So fix
that also

v2: Use MAP_SHARED instead of MAP_PRIVATE. This avoids MMU notifiers and
    evictions.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2025-08-15 09:22:20 -04:00