Commit Graph

1261 Commits

Author SHA1 Message Date
Min Zhou a82f2f3134 rocr: delete duplicated conditional expression
Change-Id: Idc8b1a8ca2975f33191a448f03cabf3fc4f8f8a6
2025-01-28 10:48:44 -05:00
Yiannis Papadopoulos 1d8a77db34 rocr/aie: AIE agent memory pools correct size and user data pool
Change-Id: I831711a7d1cdc36cbc9ed30bd74d0dc984228ce7
2025-01-28 10:48:16 -05:00
Yiannis Papadopoulos 26bfa0b8f6 rocr/aie: Add dma-buf import support for AIEAgents via the Driver interface
Change-Id: I70f8d8772dda7c06944d75042cb3034ddd89aff4
2025-01-27 15:22:46 -05:00
Shweta Khatri 6361466baa rocr: Use view3dAs2dArray flag, for thick/3D swizzle modes.
Added HSA_IMAGE_ENABLE_3D_SWIZZLE_DEBUG environment flag to
enable/disable this. Default value is false (view3dAs2dArray = 1)
Enabling this flag will enable support for swizzles that do 3D
interleaving. Note that all features of 3D images are supported
with 2D swizzles,it's just that the access patterns are different
and therefore cache hit-rates may be better or worse, depending
on how it's used. Volumetric algorithms do better with 3D and apps
that tend to access a single slice at a time do better with 2D.

Change-Id: Id8574a6710fe4333a1ee331e5ce9195a81434198
2025-01-27 09:28:33 -05:00
Tony Gutierrez 8a38f121ea rocr: Add WaitMultiple to core Signal
Replaces WaitAny with WaitMultiple to more closely align with the
underlying driver API for waiting on multiple events.

WaitMultiple adds a single parameter, wait_on_all, to the WaitAny
interface providing a single function for waiting on multiple
events when we only need AND and OR semantics for the signal
checking logic.

Change-Id: I68a4a45d48151d9d69aef02fd8f7263b9e6c0e75
2025-01-27 09:21:43 -05:00
David Yat Sin dab8f2fc65 rocr: Add support for gfx950
<squashed with patch for gfx950 generic targets>

Signed-off-by: Chris Freehill <Chris.Freehill@amd.com>

Change-Id: Ifec6d93cf46c7fbf736c6572882299e279260af6
2025-01-26 13:04:58 -05:00
Ben Vanik 7d64fe49fa rocr: Fix HostQueue to obey the alignment requirement
Change-Id: I06542e9ff94e826ca0abba0328b301fec50a95ea
2025-01-24 12:08:11 -05:00
David Yat Sin 7ea25ebb85 rocr: Add thread priority for AsyncEventHandler
Set priority to maximum for signal event handler and minimum for
exceptions event handler.

Change-Id: I1b982d3c2e4c880fafc073fe1a542d01692a6fdc
2025-01-24 10:08:12 -05:00
Ben Vanik 9971e7b004 rocr: Fixing non-portable inline attribute on hsa_flag_* utilities.
Change-Id: Ie1c53fef407a71b5ec4c6eaf3a3ed00871184408
2025-01-23 15:09:21 -05:00
Tony Gutierrez 15107afb11 rocr: Generalize driver discovery
Generalize the driver discovery and move driver-specific
functionality to the concrete driver implementations.
Currently, this process is tightly coupled to the hsakmt
which is GPU and OS specific.

Change-Id: Ie1c53fef407a71b5ec4c6eaf3a3ed00871184409
2025-01-23 15:09:14 -05:00
Tony Gutierrez 77fa5af618 rocr: Make Open() and Close() virtual in Driver
Change-Id: Iac054c08383b080ca2b2ec6d65019bf2f083b763
2025-01-23 15:09:06 -05:00
Tony Gutierrez 8bbc44d51b rocr: Forward declare Driver in the Agent class
Change-Id: Ib27081bf31446af92602f723f352fb75ec3f378e
2025-01-23 15:08:59 -05:00
Longlong Yao 5d8fba133d rocr: add AMD_KERNEL_CODE_PROPERTIES_ENABLE_WAVEFRONT_SIZE32
Change-Id: I158705499f4ab0b1231d698d66902eb4ab1ececa
Signed-off-by: LonglongYao <Longlong.Yao@amd.com>
2025-01-22 13:02:31 -05:00
Swati Rawat 77c2a21a92 Update index.rst
Change-Id: I493e3dc3782608e4d0d712569a6e6fd3b376cdbe
2025-01-21 10:05:28 -05:00
Chris Freehill b1d6cacf79 rocr: Remove RuntimeCleanup and use of loaded()
The recent static initialization changes cause this clean up to
happen when it previously never did. The result of ~RuntimeCleanup()
being executed is that the static global "loaded_" is set to false,
which in turn prevents hsa_init() from executing again. Clean up
already happens when hsa_shut_down() occurs.

Change-Id: Ib5cefb80d82880c1945e04eb6ec246bc2c7d2324
2025-01-13 09:18:13 -05:00
Flora Cui 2cc279dbbc rocr: try DefaultSignal if interrupt is disabled
Reviewed-by: Shane Xiao <shane.xiao@amd.com>
Change-Id: I5d3a3813f56990f3aca61be23215faeb0a9629cb
Signed-off-by: Flora Cui <flora.cui@amd.com>
2025-01-02 11:09:20 +08:00
Shane Xiao 2d40493c31 rocr: Fix missed read lock in ExecutableImpl::FindHostAddress
Change-Id: Ide9b5cc3aa235d3768ebbfd8dc1560bf70fd0743
Signed-off-by: Shane Xiao <shane.xiao@amd.com>
Reviewed-by: Qiang Yu <qiang.yu@amd.com>
2024-12-30 06:43:25 -05:00
Tim Huang e515b0bca5 rocr: add ISA target support for GC version 11.5.3
This add support for GC version 11.5.3

Change-Id: I1d55e33198620d3493967558c25c636d5f7ab347
Signed-off-by: Tim Huang <tim.huang@amd.com>
2024-12-30 01:44:53 -05:00
Chris Freehill 67b0082443 rocr: Dynamically allocate IsaMap
This is to avoid use after free at the program's end, when statics
are destructed.

Change-Id: Id6bf26f25a58d13bdf1ee99c852adae8add76569
2024-12-20 09:20:09 -05:00
Flora Cui ac64c54d74 rocr: skip exception_signal_ handling on exit
if .supports_exception_debugging is not enabled.

Change-Id: I944fe7aa4f3068964f47e23f5259c3802d1e9556
Signed-off-by: Flora Cui <flora.cui@amd.com>
2024-12-19 04:14:32 -05:00
Apurv Mishra 699d0140be rocr: multiple uninitialized and unused variables
Minor modifications to multiple source and header
files based on Coverity report

Change-Id: I4a73d0f56640983c4d5124e13c8c280245cca672
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
2024-12-18 10:11:13 -05:00
Apurv Mishra 441bd9fe6c rocr: refactor of runtime.cpp based on Coverity
Add return checks, initialization and clean
redundant memory operations

fix 1: check return value of 'setsockopt' for error
fix 2: check return value of 'PtrInfo' for error
fix 3: move 'tool_names' instead of copying
fix 4: call 'munmap' for 'va' only once
fix 5: use 'ssize' for possible return values of -1 (err)
fix 6: add missing initialization in constructors
fix 7: add initialization for some scalars and pointers

Change-Id: I07d90e36d4e1fe48c4de4f44e18083e5ed4c5fbc
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
2024-12-18 10:06:55 -05:00
David Yat Sin 5da1889fb7 rocr: Avoid deadlock due to queue signal not updated
Make sure waiting_ count for queue signal is always > 0 so that we
always call hsaKmtWaitOnEvent to force hsaKmtWaitOnEvent to return.

Remove incorrect warning print when running in debug mode.

Call internal Signal::WaitAny instead of AMD::hsa_amd_signal_wait_any
to avoid extra function calls.

Change-Id: I9e41b704643e4e8ee7402b1379b1c30ff4c544ef
2024-12-16 10:25:19 -05:00
Chris Freehill e93efba9cc rocr: Check generic feature compability separately
Check that generic ISAs are compatible with an agent separately
from where feature compatibility is checked.

Change-Id: I403012db5536ff1f2faf93cf013db03ef07ac1c8
2024-12-11 16:08:44 -05:00
Eddie Richter e9cc839b2b rocr/aie: AIE Queue Processing
Change-Id: I681c971ba7229037ca85d5529838aa7bbe5820e2
2024-12-10 10:50:02 -05:00
Yiannis Papadopoulos c343a9dc60 rocr/aie: Add AIEAgent missing info
Change-Id: I32e9acc7b8b7dee4e9ff5524fec5c440bb8ece0e
2024-12-07 00:04:54 +00:00
Apurv Mishra c48e8a918e rocr: initialize 'data_rdy' & correct 'const' functions
'const' member functions have syntax errors and struct
'data_rdy' have uninitialized members

v1: correct misplaced 'const' for member functions
v2: add initialization for 'data_rdy' in constructor

Change-Id: I29bada475217c9df81f0d0400e7a3f44aa8afe0c
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
2024-12-06 09:17:02 -05:00
David Yat Sin 0544c2336b rocr: Avoid polling for SDMA signals
When all 64-bits of the signal value are 0, we can skip polling for that
signal.
We need to keep signals as 64-bit numbers as part of the spec. But most
users of ROCr do will never set the signal value to more than 32-bits.
When the dependent-signals are less than 32-bits, avoid adding extra
SDMA poll packet as this adds latency to the SDMA copies.

Change-Id: I37dca65fe3f060dc7164f49b98cb1985023663c4
2024-12-04 16:45:04 -05:00
Chris Freehill f32e264933 rocr: Add gfx9-4-generic support
Change-Id: I4ebfbf0dcffa5b784d7fbfda7398d44dcc47aaef
2024-12-03 19:33:57 -05:00
taosang2 df250a49a5 rocr: Support different address modes
Support different address modes in X, Y, Z directions

Change-Id: If1db5a8af33c92ddc4b48968c3d8eceb97daea6a
2024-12-02 09:07:56 -05:00
David Yat Sin 147abb6ca0 rocr: Move _loader_debug_state to rocr namespace
This avoids exposing the symbol to the default namespace

Change-Id: I2fe5fbab4b59f271effacab93eeb2d95c236ae02
2024-11-29 10:44:23 -05:00
Chris Freehill eec2130443 rocr: Dynamically allocate supported_isas map
This was missing from a previous commit regarding
dynamically allocated static data structures.

Change-Id: Iae1c674e762f85e3aebf338210ba96942ba80278
2024-11-27 11:11:22 -05:00
Apurv Mishra 89115369cc rocr: declare 'args' as class member in 'os_thread'
Removed 'args' as a unique pointer and deletion in
'ThreadTrampoline', then declared as a class member.

Change-Id: Ia52058392d0170e8b5e57cfdd2c587f47a6f93f0
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
2024-11-27 10:27:40 -05:00
Apurv Mishra d91a14ae0c rocr: initialized missing fields in ext_table
Added initializations for 'ext_table' in 'hsa_system_get_major_extension_table()'

Change-Id: I5e46592192b7d7a294d30011481f16e93db11794
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
Reviewed-by: David Yat Sin <david.yatsin@amd.com>
2024-11-26 10:45:29 -05:00
German Andryeyev 816af44b05 rocr: Add logic to track the age of events
Some KFD versions can return from hsaKmtWaitOnMultipleEvents_Ext without
any wait and require the second call without age array init.

Change-Id: I8358c33080084d47c273c2a2827085d0570c8201
2024-11-25 14:55:22 -05:00
Apurv Mishra 6f6ee9679c rocr: uninitialized pointer read in InitScratchPool
Initialized 'scratch_base' as a nullptr to avoid
uninitialized read in hsaKmtAllocMemory()

Change-Id: I3b0e67f3fd3b591e1d21d691f0777b1d1a059b73
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
2024-11-25 14:02:37 -05:00
Apurv Mishra 610f8a1e0f rocr: Uninitialized scalar variables and pointer
Added check and initialized parameters for PtrInfo().

v1: Checking if PtrInfo() returns success.
v2: Initialization for variables being passed to PtrInfo().

Change-Id: If3ec4608c8e58be259b4fd51ad681b9bc34ddff6
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
Reviewed-by: David Yat Sin <david.yatsin@amd.com>
2024-11-22 16:23:29 -05:00
Konstantin Zhuravlyov 4c7a9a0f67 loader: add gfx9-4-generic support
Change-Id: Icb148f7a78a4ce0fc661e35d0df605e05db2de3d
2024-11-14 12:47:46 -05:00
David Yat Sin f58aff630c rocr: Fix sem_post overflow errors
WaitSemaphore and PostSemaphore are used in the HybridMutex
implementation. If HybridMutex did not have to call WaitSemaphore when
acquired, then calling PostSemaphore would cause the internal count
inside sem_t to slowly grow to large values and eventually cause
overflow.

Change-Id: I173fc17c874b49926e56991405e9086ea8c138fc
2024-11-13 21:57:26 -05:00
David Yat Sin 4ec730f1dc rocr: Add HSA_SIGNAL_WAIT_ABORT_TIMEOUT
Add support for abort timeout when hsa_signal_wait_relaxed is called and
signal does not clear within timeout.
timeout is in seconds

Change-Id: If1db5a8af33c82ddc4b48968c3d8eceb97d0ea6d
2024-11-13 21:57:02 -05:00
Konstantin Zhuravlyov ec3d4aa5e9 loader: add gfx12-generic support
Change-Id: I0bf5d48ec357278bdb7a9c4eae61a7b7995411f0
2024-11-11 16:27:47 -05:00
Konstantin Zhuravlyov cf9c2efbbd loader: add gfx1153 support
Change-Id: Ie3f0ecf1c6631d95cbff5e14ddc48e751f4c356d
2024-11-11 16:27:39 -05:00
Konstantin Zhuravlyov 7d9a51e22a loader/nfc: reorder cases when switching on targets, specific first, generic second
Change-Id: I47f38c1691b9b6ff589f7ff445143997b0801dc6
2024-11-11 16:27:34 -05:00
Konstantin Zhuravlyov 4344f012b6 loader: add missing support for gfx700
Change-Id: Ia08e93b0e2d300a183a7a5fb92604cd801b2d52a
2024-11-11 16:27:27 -05:00
Konstantin Zhuravlyov d9404a52ed amd_hsa_elf.h: bring EF_AMDGPU_MACH_* in sync with llvm-project
- formatting
  - add EF_AMDGPU_MACH_AMDGCN_RESERVED_0X56
  - add EF_AMDGPU_MACH_AMDGCN_RESERVED_0X57
  - add EF_AMDGPU_MACH_AMDGCN_GFX1153
  - add EF_AMDGPU_MACH_AMDGCN_GFX12_GENERIC

Change-Id: Ibad464c659137c0c98fa9fa9d1f293ea62684ee6
2024-11-07 18:03:27 -05:00
Chris Freehill 0878deda17 rocr: Dynamically allocate static global memory
To allow non-POD global variables to last until the last thread
has exited, use "new" to allocate the memory instead of static
allocation.

Change-Id: Ica571b61ff8068a52e472c49cb1c44917e60c8c8
2024-11-07 09:53:31 -05:00
Jaydeep Patel 700f1d9abd rocr: Decrement counter only if event is popped
Also restore dead signals cleanup for old path when HSA_WAIT_ANY_DEBUG
is used.

Change-Id: I51a7404991443c9f6cbf57b4b9e9faa694b9538c
2024-11-07 01:03:09 -05:00
Yiannis Papadopoulos 2837825b14 rocr: Adding pointer to the owner driver in Agent class
Change-Id: If913d7c7e4caf6d6e6eee3a858a27c6027c2923f
2024-10-31 12:29:10 -04:00
Chris Freehill c7521a5f2a rocr: Fix supported_isas transient memory issue
An ASAN run of the release build revealed some elements of
the supported_isas static map were still using stack data. This
change makes it use heap data so it will persist.

Change-Id: Ie51887e88b9e2dec27acfc97ea45a6219fea971c
2024-10-31 11:59:29 -04:00
Jonathan Kim 7f8676e177 rocr: revert back to old copy behaviour with no xgmi sdma engines
SDMA queue resources are limited when all SDMA copies are bottle necked
into 2 engines.  Callers will not be able to make the best decisions
to allocate queue resources fairly so have ROCr fallback to old round
robin behaviour dictated by KFD.

Change-Id: I93d52297976d74e20129c5eb1dcfbfa5aa5067a7
2024-10-29 16:01:01 -04:00