Commit Graph

1089 Commits

Author SHA1 Message Date
David Yat Sin 5da1889fb7 rocr: Avoid deadlock due to queue signal not updated
Make sure waiting_ count for queue signal is always > 0 so that we
always call hsaKmtWaitOnEvent to force hsaKmtWaitOnEvent to return.

Remove incorrect warning print when running in debug mode.

Call internal Signal::WaitAny instead of AMD::hsa_amd_signal_wait_any
to avoid extra function calls.

Change-Id: I9e41b704643e4e8ee7402b1379b1c30ff4c544ef
2024-12-16 10:25:19 -05:00
Chris Freehill e93efba9cc rocr: Check generic feature compability separately
Check that generic ISAs are compatible with an agent separately
from where feature compatibility is checked.

Change-Id: I403012db5536ff1f2faf93cf013db03ef07ac1c8
2024-12-11 16:08:44 -05:00
Eddie Richter e9cc839b2b rocr/aie: AIE Queue Processing
Change-Id: I681c971ba7229037ca85d5529838aa7bbe5820e2
2024-12-10 10:50:02 -05:00
Yiannis Papadopoulos c343a9dc60 rocr/aie: Add AIEAgent missing info
Change-Id: I32e9acc7b8b7dee4e9ff5524fec5c440bb8ece0e
2024-12-07 00:04:54 +00:00
Apurv Mishra c48e8a918e rocr: initialize 'data_rdy' & correct 'const' functions
'const' member functions have syntax errors and struct
'data_rdy' have uninitialized members

v1: correct misplaced 'const' for member functions
v2: add initialization for 'data_rdy' in constructor

Change-Id: I29bada475217c9df81f0d0400e7a3f44aa8afe0c
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
2024-12-06 09:17:02 -05:00
David Yat Sin 0544c2336b rocr: Avoid polling for SDMA signals
When all 64-bits of the signal value are 0, we can skip polling for that
signal.
We need to keep signals as 64-bit numbers as part of the spec. But most
users of ROCr do will never set the signal value to more than 32-bits.
When the dependent-signals are less than 32-bits, avoid adding extra
SDMA poll packet as this adds latency to the SDMA copies.

Change-Id: I37dca65fe3f060dc7164f49b98cb1985023663c4
2024-12-04 16:45:04 -05:00
Chris Freehill f32e264933 rocr: Add gfx9-4-generic support
Change-Id: I4ebfbf0dcffa5b784d7fbfda7398d44dcc47aaef
2024-12-03 19:33:57 -05:00
taosang2 df250a49a5 rocr: Support different address modes
Support different address modes in X, Y, Z directions

Change-Id: If1db5a8af33c92ddc4b48968c3d8eceb97daea6a
2024-12-02 09:07:56 -05:00
David Yat Sin 147abb6ca0 rocr: Move _loader_debug_state to rocr namespace
This avoids exposing the symbol to the default namespace

Change-Id: I2fe5fbab4b59f271effacab93eeb2d95c236ae02
2024-11-29 10:44:23 -05:00
Chris Freehill eec2130443 rocr: Dynamically allocate supported_isas map
This was missing from a previous commit regarding
dynamically allocated static data structures.

Change-Id: Iae1c674e762f85e3aebf338210ba96942ba80278
2024-11-27 11:11:22 -05:00
Apurv Mishra 89115369cc rocr: declare 'args' as class member in 'os_thread'
Removed 'args' as a unique pointer and deletion in
'ThreadTrampoline', then declared as a class member.

Change-Id: Ia52058392d0170e8b5e57cfdd2c587f47a6f93f0
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
2024-11-27 10:27:40 -05:00
Apurv Mishra d91a14ae0c rocr: initialized missing fields in ext_table
Added initializations for 'ext_table' in 'hsa_system_get_major_extension_table()'

Change-Id: I5e46592192b7d7a294d30011481f16e93db11794
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
Reviewed-by: David Yat Sin <david.yatsin@amd.com>
2024-11-26 10:45:29 -05:00
German Andryeyev 816af44b05 rocr: Add logic to track the age of events
Some KFD versions can return from hsaKmtWaitOnMultipleEvents_Ext without
any wait and require the second call without age array init.

Change-Id: I8358c33080084d47c273c2a2827085d0570c8201
2024-11-25 14:55:22 -05:00
Apurv Mishra 6f6ee9679c rocr: uninitialized pointer read in InitScratchPool
Initialized 'scratch_base' as a nullptr to avoid
uninitialized read in hsaKmtAllocMemory()

Change-Id: I3b0e67f3fd3b591e1d21d691f0777b1d1a059b73
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
2024-11-25 14:02:37 -05:00
Apurv Mishra 610f8a1e0f rocr: Uninitialized scalar variables and pointer
Added check and initialized parameters for PtrInfo().

v1: Checking if PtrInfo() returns success.
v2: Initialization for variables being passed to PtrInfo().

Change-Id: If3ec4608c8e58be259b4fd51ad681b9bc34ddff6
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
Reviewed-by: David Yat Sin <david.yatsin@amd.com>
2024-11-22 16:23:29 -05:00
Konstantin Zhuravlyov 4c7a9a0f67 loader: add gfx9-4-generic support
Change-Id: Icb148f7a78a4ce0fc661e35d0df605e05db2de3d
2024-11-14 12:47:46 -05:00
David Yat Sin f58aff630c rocr: Fix sem_post overflow errors
WaitSemaphore and PostSemaphore are used in the HybridMutex
implementation. If HybridMutex did not have to call WaitSemaphore when
acquired, then calling PostSemaphore would cause the internal count
inside sem_t to slowly grow to large values and eventually cause
overflow.

Change-Id: I173fc17c874b49926e56991405e9086ea8c138fc
2024-11-13 21:57:26 -05:00
David Yat Sin 4ec730f1dc rocr: Add HSA_SIGNAL_WAIT_ABORT_TIMEOUT
Add support for abort timeout when hsa_signal_wait_relaxed is called and
signal does not clear within timeout.
timeout is in seconds

Change-Id: If1db5a8af33c82ddc4b48968c3d8eceb97d0ea6d
2024-11-13 21:57:02 -05:00
Konstantin Zhuravlyov ec3d4aa5e9 loader: add gfx12-generic support
Change-Id: I0bf5d48ec357278bdb7a9c4eae61a7b7995411f0
2024-11-11 16:27:47 -05:00
Konstantin Zhuravlyov cf9c2efbbd loader: add gfx1153 support
Change-Id: Ie3f0ecf1c6631d95cbff5e14ddc48e751f4c356d
2024-11-11 16:27:39 -05:00
Konstantin Zhuravlyov 7d9a51e22a loader/nfc: reorder cases when switching on targets, specific first, generic second
Change-Id: I47f38c1691b9b6ff589f7ff445143997b0801dc6
2024-11-11 16:27:34 -05:00
Konstantin Zhuravlyov 4344f012b6 loader: add missing support for gfx700
Change-Id: Ia08e93b0e2d300a183a7a5fb92604cd801b2d52a
2024-11-11 16:27:27 -05:00
Konstantin Zhuravlyov d9404a52ed amd_hsa_elf.h: bring EF_AMDGPU_MACH_* in sync with llvm-project
- formatting
  - add EF_AMDGPU_MACH_AMDGCN_RESERVED_0X56
  - add EF_AMDGPU_MACH_AMDGCN_RESERVED_0X57
  - add EF_AMDGPU_MACH_AMDGCN_GFX1153
  - add EF_AMDGPU_MACH_AMDGCN_GFX12_GENERIC

Change-Id: Ibad464c659137c0c98fa9fa9d1f293ea62684ee6
2024-11-07 18:03:27 -05:00
Chris Freehill 0878deda17 rocr: Dynamically allocate static global memory
To allow non-POD global variables to last until the last thread
has exited, use "new" to allocate the memory instead of static
allocation.

Change-Id: Ica571b61ff8068a52e472c49cb1c44917e60c8c8
2024-11-07 09:53:31 -05:00
Jaydeep Patel 700f1d9abd rocr: Decrement counter only if event is popped
Also restore dead signals cleanup for old path when HSA_WAIT_ANY_DEBUG
is used.

Change-Id: I51a7404991443c9f6cbf57b4b9e9faa694b9538c
2024-11-07 01:03:09 -05:00
Yiannis Papadopoulos 2837825b14 rocr: Adding pointer to the owner driver in Agent class
Change-Id: If913d7c7e4caf6d6e6eee3a858a27c6027c2923f
2024-10-31 12:29:10 -04:00
Chris Freehill c7521a5f2a rocr: Fix supported_isas transient memory issue
An ASAN run of the release build revealed some elements of
the supported_isas static map were still using stack data. This
change makes it use heap data so it will persist.

Change-Id: Ie51887e88b9e2dec27acfc97ea45a6219fea971c
2024-10-31 11:59:29 -04:00
Jonathan Kim 7f8676e177 rocr: revert back to old copy behaviour with no xgmi sdma engines
SDMA queue resources are limited when all SDMA copies are bottle necked
into 2 engines.  Callers will not be able to make the best decisions
to allocate queue resources fairly so have ROCr fallback to old round
robin behaviour dictated by KFD.

Change-Id: I93d52297976d74e20129c5eb1dcfbfa5aa5067a7
2024-10-29 16:01:01 -04:00
Chris Freehill 0c18ff22e1 rocr: Generic ISA targets support
Change-Id: I6a0341ec9c1ec1e710143676b80a8a3c1a78f725
2024-10-28 08:54:06 -05:00
Chris Freehill 08699069d6 rocr: Quiet some ROCr compile warnings
These are mostly AIE related, but there are a couple of others.

Change-Id: I549e004772160ca282d4c94dc9d94dd2ccae8b1c
2024-10-28 09:08:14 -04:00
German Andryeyev 0fc7369ba5 rocr: Disable WaitAny() in AsyncEventsLoop()
- Add the new path to avoid WaitAny() calls  in AsyncEventsLoopp() with
HSA_WAIT_ANY_DEBUG key. The new path is selected by default.
The optimizaiton combines all logic of WaitAny() in a single processing loop
and avoids extra memory allocations or ref counting.  Also it won't spin
on the CPU if all events are busy.

Change-Id: I197ce60d0d023fbb672f700d6e87702686f1f55a
2024-10-25 14:37:02 -04:00
David Yat Sin d90fbee9c4 rocr: find first dispatch pkt that needs scratch
On GPUs where EOP is handled in asic, the read_dispatch_id is not always
updated after each packet. Look for the first dispatch packet that needs
scratch memory before allocating scratch.

Change-Id: Ibf4b4b485f99bf2fabfe48e9609ca99111fdafbe
2024-10-25 14:36:40 -04:00
Yiannis Papadopoulos c7785a6da1 rocr/aie: Remove unused set container and error when using AIE agents in MemoryRegion
Change-Id: Icf1e56412c840810a679f376293a616068841b8c
2024-10-23 09:42:32 -04:00
Chris Freehill fd99b74287 rocr: supported_isas map elements should persist
The supported_isas static unordered_map was adding stack
allocated Isa objects. Instead, make the objects statically
allocated, as supported_isas itself is.

Change-Id: I23405e218290d48deea6f984f76c57e7b43e314e
2024-10-22 18:09:03 -05:00
Chris Freehill 9b13bcd0ac rocr: Ensure globals are initialized at first use
When ROCr is built as a static library, global variables
were often not initialized to valid values at their first
use. This change addresses that problem.

Change-Id: I550fa41feb3bc04b9cc686bcfb4acf2a7b651a88
2024-10-16 23:19:48 -04:00
David Yat Sin d58c9dea0a rocr: Add executable flag for memory allocations
Change-Id: I8307cd3562c3ab9c12fef8c457a59916e33b7923
2024-10-15 16:52:00 +00:00
jokim 1d6ff45673 rocr: Workaround segfault on GFX9 devices older than GFX90a
Devices older than GFX90a hit a segfault on queue unmap when an
SDMA queue has been assigned a fixed engine.  Bypass fixing the
engine for these devices for now.

Change-Id: I7d2f882d2377f004a7bb65f3b397396db07ce6d3
2024-10-10 14:41:10 -04:00
David Yat Sin dbae8da515 rocr: Fix memory leak on non-visible GPUs
Fix memory leak for memory regions objects when GPU is masked using
ROCR_VISIBLE_DEVICES.

Change-Id: I610842a18adbc3cdc854b12650844e271bc00592
2024-10-04 17:40:47 -04:00
Jonathan Kim 0ae064fe2d rocr: Use new extended graphics handle registration call on IPC import
To correctly map to all GPUs after an import, use the new extended
registration call that can import a virtual address without having to
specify a target node.

Change-Id: Ifca8f6f6ee24fa99b2af357dcc3ea1de3ab234f7
2024-10-03 14:06:37 -04:00
David Yat Sin 73f6bfa747 rocr/vmm: Only modify permisions for specified agents
When hsa_amd_vmem_set_access is called, do not remove permissions for
unspecified agents. Also updating documentation in header to clarify
this.

Change-Id: I3bb4cf08ba399f85cc67b17fd13a4a40d862415f
2024-09-30 17:41:58 -04:00
Jonathan Kim 909b82d463 rocr: Fix race condition in IPC DMABUF socket server
Socket server accept calls do not guarantee synchronous actions
post-accept. This can result in a race condition.

To resolve this, first limit the socket server's listen backlog to a
single connection. This will force competing clients to busy-retry
until timeout.

Second, make the DMABUF IPC file descriptor send-receive and import
calls into an atomic routine per connection.

By doing these fixes, not only to we resolve potential races but
we guarantee that any exporter process will create at most one
file descriptor that will only last for the duration of the import
transaction.  This alleviates any concern on running into system
limits for the number of open file descriptors per process.

Change-Id: I6d8b14795a680d89a2707e082fa027d525792e05
2024-09-27 14:40:59 -04:00
Jonathan Kim 32bb0764b7 rocr: Fix IPC DMA Buf fragment handling and enable for development
Discarding blocks for reallocation on IPC export for better memory
performance trigger memory violations with DMA BUF exports so bypass
this for now as application performance drops haven't been observed
with the bypass.

The raw fragment should be passed to the DMA Buf export call as well
since offsets will be implicitly applied in the Thunk/KFD for
export/import calls.

Also, use the agent information directly from the pointer
information so that the export call doesn't have to scan memory to find
this.  Pass the node ID in the handle so that the import call doesn't
have to make two thunk imports to fetch the node ID for GPU memory
imports.

Finally, allow the user to use DMA Buf IPC via
HSA_ENABLE_IPC_MODE_LEGACY=0 for developer testing as legacy mode will
be applied by default.

Change-Id: Ie8fe267f8768fa5df37126078406f7065f69ff4e
2024-09-27 14:40:42 -04:00
Young Hui b530e0f619 docs: move .readthedocs.yaml to the root of repo
Change-Id: I6c5afb806c47c029359a2dee2a7e73c6d076cfb1
2024-09-23 15:49:19 +00:00
Young Hui 75b674f0ad docs: path adjustments to allow documentation to build again
- adding doc files to .gitignore

Change-Id: Ia6b2358bb1f236298ad1d705c1bed0636026632d
2024-09-23 15:49:13 +00:00
Yiannis Papadopoulos 48fdc17179 rocr/aie: Correct reporting of dev heap size
Storing the correct dev heap size in the memory region.

Change-Id: I14b053330c187da1d7d0213256625e50795b9902
2024-09-20 12:44:23 -04:00
James Xu f3664fd124 rocr: Add nullptr check in IterateExecutables
When an entry is deleted from the array, it's set to nullptr
but not removed. Most other functions that
iterate over the array check if the entry is nullptr
but this loop in IterateExecutables did not.

Change-Id: I763b361eea59f6df201bb86ead0234e95f2cf79c
2024-09-19 19:44:53 +00:00
David Yat Sin 7f3dcd4e0b rocr: Add extended fine-grain memory on host
Change-Id: Id9317fee89b51a5097459255e0a3092820eff430
2024-09-19 19:44:53 +00:00
David Yat Sin 0af7a54ebe rocr: Return err when freeing invalid pointer
Return false if trying to free a NULL pointer (or invalid size)
internally in ROCr. This is to detect errors within ROCr when trying
to free NULL pointers. If a user of ROCr tries to free a NULL
pointer, this condition should be caught at the beginning of the
Runtime::FreeMemory(...) function and return HSA_STATUS_SUCCESS. This
matches the behavior of the free(...) or delete functions that
silently ignores calls when the passed a NULL pointer.

Change-Id: I84bc26928b35023e19cd9f214b42c6ee9508029c
2024-09-19 19:44:53 +00:00
David Yat Sin 561c44a4a9 rocr: extend agents_allow_access support VMM
Extend hsa_amd_agents_allow_access API to handle memory allocations done
via VMM APIs.

Change-Id: I4ae51d3e42dd104e98d513b1da86133d312a7203
2024-09-19 19:44:53 +00:00
David Yat Sin 8f1b05660a rocr: refactor VMemorySetAccess function
Refactor VMemorySetAccess so that it can be re-used in the following
patch.

Change-Id: I341241da7a59724bb3611172f0d26b0689d7bb46
2024-09-19 19:44:53 +00:00