Wykres commitów

2654 Commity

Autor SHA1 Wiadomość Data
Konstantin Zhuravlyov ec3d4aa5e9 loader: add gfx12-generic support
Change-Id: I0bf5d48ec357278bdb7a9c4eae61a7b7995411f0
2024-11-11 16:27:47 -05:00
Konstantin Zhuravlyov cf9c2efbbd loader: add gfx1153 support
Change-Id: Ie3f0ecf1c6631d95cbff5e14ddc48e751f4c356d
2024-11-11 16:27:39 -05:00
Konstantin Zhuravlyov 7d9a51e22a loader/nfc: reorder cases when switching on targets, specific first, generic second
Change-Id: I47f38c1691b9b6ff589f7ff445143997b0801dc6
2024-11-11 16:27:34 -05:00
Konstantin Zhuravlyov 4344f012b6 loader: add missing support for gfx700
Change-Id: Ia08e93b0e2d300a183a7a5fb92604cd801b2d52a
2024-11-11 16:27:27 -05:00
Ranjith Ramakrishnan 2970545ded Correct the provides field of hsa-rocr and has-rocr-devel package
runtime and devel packages are providing the hsakmt packages. Only devel package need to provide the same
Change the package replaces/obsoletes field accordingly

Change-Id: Ia1a4f128a1f6928faf57faee5f301a77c21acca2
2024-11-08 13:51:10 -05:00
Konstantin Zhuravlyov d9404a52ed amd_hsa_elf.h: bring EF_AMDGPU_MACH_* in sync with llvm-project
- formatting
  - add EF_AMDGPU_MACH_AMDGCN_RESERVED_0X56
  - add EF_AMDGPU_MACH_AMDGCN_RESERVED_0X57
  - add EF_AMDGPU_MACH_AMDGCN_GFX1153
  - add EF_AMDGPU_MACH_AMDGCN_GFX12_GENERIC

Change-Id: Ibad464c659137c0c98fa9fa9d1f293ea62684ee6
2024-11-07 18:03:27 -05:00
Chris Freehill 0878deda17 rocr: Dynamically allocate static global memory
To allow non-POD global variables to last until the last thread
has exited, use "new" to allocate the memory instead of static
allocation.

Change-Id: Ica571b61ff8068a52e472c49cb1c44917e60c8c8
2024-11-07 09:53:31 -05:00
Jaydeep Patel 700f1d9abd rocr: Decrement counter only if event is popped
Also restore dead signals cleanup for old path when HSA_WAIT_ANY_DEBUG
is used.

Change-Id: I51a7404991443c9f6cbf57b4b9e9faa694b9538c
2024-11-07 01:03:09 -05:00
AravindanC 1a0de862aa Update static package dependency of rocrtst
Change-Id: Ic12a6f2ec3bd03d871815810cc79488e7d5c57ab
2024-11-06 07:06:37 -08:00
Yiannis Papadopoulos 2837825b14 rocr: Adding pointer to the owner driver in Agent class
Change-Id: If913d7c7e4caf6d6e6eee3a858a27c6027c2923f
2024-10-31 12:29:10 -04:00
Chris Freehill c7521a5f2a rocr: Fix supported_isas transient memory issue
An ASAN run of the release build revealed some elements of
the supported_isas static map were still using stack data. This
change makes it use heap data so it will persist.

Change-Id: Ie51887e88b9e2dec27acfc97ea45a6219fea971c
2024-10-31 11:59:29 -04:00
Chris Freehill 4256630fd0 rocr: Fix several rocrtst memory errors
Change-Id: I9049a3905fb26cf9b8ad0839684a70771a49f616
2024-10-30 20:36:25 -04:00
Jonathan Kim 7f8676e177 rocr: revert back to old copy behaviour with no xgmi sdma engines
SDMA queue resources are limited when all SDMA copies are bottle necked
into 2 engines.  Callers will not be able to make the best decisions
to allocate queue resources fairly so have ROCr fallback to old round
robin behaviour dictated by KFD.

Change-Id: I93d52297976d74e20129c5eb1dcfbfa5aa5067a7
2024-10-29 16:01:01 -04:00
Chris Freehill 0c18ff22e1 rocr: Generic ISA targets support
Change-Id: I6a0341ec9c1ec1e710143676b80a8a3c1a78f725
2024-10-28 08:54:06 -05:00
Chris Freehill 08699069d6 rocr: Quiet some ROCr compile warnings
These are mostly AIE related, but there are a couple of others.

Change-Id: I549e004772160ca282d4c94dc9d94dd2ccae8b1c
2024-10-28 09:08:14 -04:00
German Andryeyev 0fc7369ba5 rocr: Disable WaitAny() in AsyncEventsLoop()
- Add the new path to avoid WaitAny() calls  in AsyncEventsLoopp() with
HSA_WAIT_ANY_DEBUG key. The new path is selected by default.
The optimizaiton combines all logic of WaitAny() in a single processing loop
and avoids extra memory allocations or ref counting.  Also it won't spin
on the CPU if all events are busy.

Change-Id: I197ce60d0d023fbb672f700d6e87702686f1f55a
2024-10-25 14:37:02 -04:00
David Yat Sin d90fbee9c4 rocr: find first dispatch pkt that needs scratch
On GPUs where EOP is handled in asic, the read_dispatch_id is not always
updated after each packet. Look for the first dispatch packet that needs
scratch memory before allocating scratch.

Change-Id: Ibf4b4b485f99bf2fabfe48e9609ca99111fdafbe
2024-10-25 14:36:40 -04:00
Philip Yang e6d4a32c42 kfdtest: Update KFDSVMEvictTest.QueueTest for CPX mode
Current test has 4 processes, each process allocate and access 512
buffers, this requires 2048 waves to access 2048 buffers at same time to
finish the test. For CPX compute partition mode, each compute node has
less waves and cause random test failure. Change test to 2 processes to
use 1024 waves to access 1024 buffers with the increased buffer size.

Add waves_num check to avoid the test failure on new ASICs or simulator,
skip test if the available waves is less than 1024.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Change-Id: I64b5f9172b62cf38f62fbb0b48a801b8a11401c0
2024-10-24 12:57:30 -04:00
Yiannis Papadopoulos c7785a6da1 rocr/aie: Remove unused set container and error when using AIE agents in MemoryRegion
Change-Id: Icf1e56412c840810a679f376293a616068841b8c
2024-10-23 09:42:32 -04:00
Chris Freehill fd99b74287 rocr: supported_isas map elements should persist
The supported_isas static unordered_map was adding stack
allocated Isa objects. Instead, make the objects statically
allocated, as supported_isas itself is.

Change-Id: I23405e218290d48deea6f984f76c57e7b43e314e
2024-10-22 18:09:03 -05:00
David Yat Sin e1865f7b16 kfdtest: Inherit CXX flags
Change-Id: I2e902ec3e6fd582c53a6d95cd49fe2b18f56b8ca
2024-10-17 14:17:08 -04:00
Chris Freehill 9b13bcd0ac rocr: Ensure globals are initialized at first use
When ROCr is built as a static library, global variables
were often not initialized to valid values at their first
use. This change addresses that problem.

Change-Id: I550fa41feb3bc04b9cc686bcfb4acf2a7b651a88
2024-10-16 23:19:48 -04:00
David Yat Sin 80da7d5ee4 Revert "hsakmt: Only set exec flag when requested"
This reverts commit 75143555fa.

Reason for revert: 
This is currently breaking some tools. Will put it back as soon as tools update their code.

Change-Id: I05c82d443f3a274a618d05e6dc5a87943f5dc7a4
2024-10-16 20:31:27 -04:00
David Yat Sin d58c9dea0a rocr: Add executable flag for memory allocations
Change-Id: I8307cd3562c3ab9c12fef8c457a59916e33b7923
2024-10-15 16:52:00 +00:00
David Yat Sin ead3aafcda rocrtst: Fix VirtMemory_Basic_Test permissions
Fix VirtMemory_Basic_Test permissions to adjust for previous change to
the hsa_amd_vmem_set_access behavior change that was done with this
patch:

rocr/vmm: Only modify permisions for specified agents

Change-Id: I97230600b9b9144459b08ca3da3a5bfbdbb98231
2024-10-11 10:41:11 -04:00
jokim 1d6ff45673 rocr: Workaround segfault on GFX9 devices older than GFX90a
Devices older than GFX90a hit a segfault on queue unmap when an
SDMA queue has been assigned a fixed engine.  Bypass fixing the
engine for these devices for now.

Change-Id: I7d2f882d2377f004a7bb65f3b397396db07ce6d3
2024-10-10 14:41:10 -04:00
Kent Russell ccd80d19ba kfdtest: Fix in-tree scriptless build
If you build thunk following the instructions in the thunk's README,
there is no /lib folder in the build folder. Adjust the include path,
and clean up the docs to reflect that. The header include is already
defined in the CMake file as ../../include, so we don't use
LIBHSAKMT_PATH for that linking, just the lib location

Change-Id: I73435d59adb9d01f527a28b1935086260e9d3d70
Signed-off-by: Kent Russell <kent.russell@amd.com>
2024-10-08 14:42:33 -04:00
Shweta Khatri 8bc4efc8ca hsakmt: pmc_table.c: Fix Coverity reported warnings
Eliminate out-of-bounds access in get_block_properties

Change-Id: I3abee1e36fafdda053d4bc4a611698d676b01d5c
2024-10-07 14:15:26 -04:00
Shweta Khatri 52e7fd1480 hsakmt: debug.c: Fix Coverity reported warnings
Fix potential  memory leak reported by Coverity warnings

Change-Id: Iacbaa99be3f4fe7fae5fb6a10bd41dfc34b96059
2024-10-07 14:14:26 -04:00
Shweta Khatri c9454794b6 hsakmt: fmm.c: Fix Coverity reported warnings
Fixed multiple issues related to memory management, atomicity,
and error handling across various functions: handle null checks,
use-after-free, unchecked returns, and memory leaks.

Change-Id: Ia7c76320cc20e24001052fbba2dd0600bd412140
2024-10-07 13:54:03 -04:00
David Yat Sin dbae8da515 rocr: Fix memory leak on non-visible GPUs
Fix memory leak for memory regions objects when GPU is masked using
ROCR_VISIBLE_DEVICES.

Change-Id: I610842a18adbc3cdc854b12650844e271bc00592
2024-10-04 17:40:47 -04:00
Jonathan Kim 0ae064fe2d rocr: Use new extended graphics handle registration call on IPC import
To correctly map to all GPUs after an import, use the new extended
registration call that can import a virtual address without having to
specify a target node.

Change-Id: Ifca8f6f6ee24fa99b2af357dcc3ea1de3ab234f7
2024-10-03 14:06:37 -04:00
Jonathan Kim 03463ed2c0 hsakmt: Enable graphics handle registration with a virtual address
Currently registering graphics memory without specifying a target
node will return a memory handle that's not a virtual address.

As a result, ROCr is forced to register with a target node for
IPC usage.

Mapping memory without specifying a target node afterwards will
result in mapping to the target node that was imported because the
previous import call flags this node targeting action to future mapping.

For ROCr IPC usage, ROCr wants to map to all GPU nodes if the target node
is not specified.

Allow the caller to register graphics handles that returns a virtual
address without having to specify the target node so that the caller
can make a subsequent map call to all GPUs.

Change-Id: I5a935092b885cc3568e4f3a5dd951c7ec6c84fca
2024-10-03 14:06:31 -04:00
Ranjith Ramakrishnan f27ae44b8c cmake component grouping should not be ignored
In static build, the dev and binary components are grouped to generate static package
Removed the line that was ignoring the component grouping

Change-Id: Ie0ca9db109f2002891260985634f2e6b1ea7f236
2024-10-01 14:22:21 -07:00
Shweta Khatri 9f43c9fd51 hsakmt: spm.c: Fix Coverity reported warnings
Fix unused ret value and initialize gpu_id

Change-Id: Ib3acc7db4bbab519318d0970786a5dc641dcc9eb
2024-09-30 19:46:51 -04:00
David Yat Sin 73f6bfa747 rocr/vmm: Only modify permisions for specified agents
When hsa_amd_vmem_set_access is called, do not remove permissions for
unspecified agents. Also updating documentation in header to clarify
this.

Change-Id: I3bb4cf08ba399f85cc67b17fd13a4a40d862415f
2024-09-30 17:41:58 -04:00
Mukul Joshi b81e45f03c kfdtest: Update KFDPerformanceTest.P2PBandWidthTest for CPX mode
Currently, KFDPerformanceTest.P2PBandWidthTest cannot work if there are
more than 16 KFD nodes in the system. This limit was put in to match the
number of SDMA queues supported on a single node.
This patch updates the test to make it run on systems with more than
16 KFD nodes.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Change-Id: I561d0cdef664cae84fb9c13a801052e2001256e5
2024-09-30 11:28:33 -04:00
Jonathan Kim 909b82d463 rocr: Fix race condition in IPC DMABUF socket server
Socket server accept calls do not guarantee synchronous actions
post-accept. This can result in a race condition.

To resolve this, first limit the socket server's listen backlog to a
single connection. This will force competing clients to busy-retry
until timeout.

Second, make the DMABUF IPC file descriptor send-receive and import
calls into an atomic routine per connection.

By doing these fixes, not only to we resolve potential races but
we guarantee that any exporter process will create at most one
file descriptor that will only last for the duration of the import
transaction.  This alleviates any concern on running into system
limits for the number of open file descriptors per process.

Change-Id: I6d8b14795a680d89a2707e082fa027d525792e05
2024-09-27 14:40:59 -04:00
Jonathan Kim 32bb0764b7 rocr: Fix IPC DMA Buf fragment handling and enable for development
Discarding blocks for reallocation on IPC export for better memory
performance trigger memory violations with DMA BUF exports so bypass
this for now as application performance drops haven't been observed
with the bypass.

The raw fragment should be passed to the DMA Buf export call as well
since offsets will be implicitly applied in the Thunk/KFD for
export/import calls.

Also, use the agent information directly from the pointer
information so that the export call doesn't have to scan memory to find
this.  Pass the node ID in the handle so that the import call doesn't
have to make two thunk imports to fetch the node ID for GPU memory
imports.

Finally, allow the user to use DMA Buf IPC via
HSA_ENABLE_IPC_MODE_LEGACY=0 for developer testing as legacy mode will
be applied by default.

Change-Id: Ie8fe267f8768fa5df37126078406f7065f69ff4e
2024-09-27 14:40:42 -04:00
Kent Russell d64e33520f rocrtst: Various codeql fixes
Fix some potentially unreleased memory, null value checks, files not
closed, and other such issues reported by codeql

Change-Id: Ia679aff97a773a642d8c8cbadeae30955554a62e
Signed-off-by: Kent Russell <kent.russell@amd.com>
2024-09-27 09:56:18 -04:00
Samuel Zhang 5a1b6bf14d SWDEV-484614 - KFDSVMRangeTest.HMMProfilingEvent/1 random fail in VM
In VM with 6vcpu, cpu schedule of
queue_delayed_work(system_freezable_wq) is lower than BM.
HSA_SMI_EVENT_QUEUE_RESTORE event from case HMMProfilingEvent/0 got
delayed execution and caused HMMProfilingEvent/1 fail.

The fix is only listen to HSA_SMI_EVENT_MIGRATE_START event and ignore
all other events.

Change-Id: I534e49b030bd4c534bc7a63eb431f4907659c8cd
2024-09-26 13:37:08 +08:00
Jonathan Kim 027af8dacd hsakmt: Update hsa capabilities with precise ALU ops
Update the HSA capabilities field with precise ALU ops bit support
for GPU debugging.

Change-Id: I796f2c2e0559577828aba510c401ed5187e10179
2024-09-24 13:45:04 -04:00
Jonathan Kim a926a070ee hsakmt: Update thunk doc comments for debug firmware support caps
Update commentary on HWS scheduler support bit for GPU debugging in
the HSA capabilities node properties field.

Change-Id: I59c519d74a528d5ecf5817ef94e75091314bd844
2024-09-23 16:34:31 -04:00
Shweta Khatri 681610937a hsakmt: queues.c: Fix Coverity reported warnings
Move variable declarations inline and add NULL checks to prevent errors

Change-Id: Ia5bf5e245bcc0f756a15bc799b55c5e2a8459f89
2024-09-23 15:07:28 -04:00
Longlong Yao 3b829d0e62 rocrtst: fix resource leak
Change-Id: Ib57dccad0b639539e1076daba31eef278f2cf638
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2024-09-23 15:04:43 -04:00
Young Hui b530e0f619 docs: move .readthedocs.yaml to the root of repo
Change-Id: I6c5afb806c47c029359a2dee2a7e73c6d076cfb1
2024-09-23 15:49:19 +00:00
Young Hui 75b674f0ad docs: path adjustments to allow documentation to build again
- adding doc files to .gitignore

Change-Id: Ia6b2358bb1f236298ad1d705c1bed0636026632d
2024-09-23 15:49:13 +00:00
Shweta Khatri 857200e28c hsakmt: events.c: Fix Coverity reported warnings
Fix data race by protecting events_page access with mutex in event create
Fix potential NULL dereference in hsaKmtWaitOnMultipleEvents_Ext
Fix unchecked return value in hsaKmtCreateEvent function

Change-Id: I434bef43666e5205a8b061259569c1d99a952752
2024-09-23 11:35:02 -04:00
Shweta Khatri 303c02690d hsakmt: hsakmttypes.h: Fix Coverity reported warnings
Eliminated declared but not referenced variables to fix warnings

Change-Id: I80032a699fb59ce4635c5001f669d009ba60e588
2024-09-23 11:34:44 -04:00
Shweta Khatri 659fa04d8c hsakmt: topology.c: Fix Coverity reported warnings
Refactor fscanf_str to use fgets for safer string handling, remove unused code

Change-Id: Ibf4b4b485f99bf2fabfe48e9609ca99111feaf1e
2024-09-23 11:34:28 -04:00