Graf commitů

1138 Commity

Autor SHA1 Zpráva Datum
Yiannis Papadopoulos c7936334cf rocr/aie: Changing variable names 2025-03-11 19:35:21 -04:00
Yiannis Papadopoulos fb33e2e724 rocr/aie: Handle non-HSA_STATUS_SUCCESS during VisitRegion 2025-03-11 19:35:21 -04:00
Longlong Yao a254e35fd6 rocr: export pointer type for OnlyAddress
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2025-03-11 10:16:58 -04:00
zichguan-amd 3415a500c7 Throw exception when runtime not initialized for hsa_amd_signal_wait_*
Signed-off-by: zichguan-amd <zichuan.guan@amd.com>
2025-03-07 15:17:10 -05:00
zichguan-amd e4d027191c rocr: Allow 0/NULL/invalid signal handles for wait operations to be no-op
Remove hard assertions for signal validation on hsa_amd_signal_wait_* operations, instead ignore 0/NULL/invalid signals in the dependency condition evaluation to align with HSA specs for barrier-AND and barrier-OR packets.

Signed-off-by: zichguan-amd <zichuan.guan@amd.com>
2025-03-07 15:17:10 -05:00
David Yat Sin 02b38d0614 rocr: Put back scratch_backing_memory_byte_size
The scratch_backing_memory_byte_size is not used by CP, but it is
currently used by rocgdb. Putting the field back, but we need to find a
solution for alt_scratch_backing_memory_byte_size.

Also, completely disabling alternate scratch as we need some changes to
support debugger.
2025-03-06 16:23:38 -05:00
David Yat Sin 6dac90c89a rocr: Only expose ext-fine-grain pool on xgmi-hive systems
We cannot guarrantee system-scope coherency on systems with only PCIe
connections, so do not expose extended fine-grain memory pool on these
systems.
2025-03-05 10:41:38 -05:00
Lao, Darren 0cd46b6582 rocr: Change grid dimensions
Signed-off-by: Lao, Darren <Darren.Lao@amd.com>
2025-03-04 16:19:51 -05:00
David Yat Sin d031af9eb5 rocr: Check RLIMIT_CORE before generating coredump
Check for RLIMIT_CORE before collecting data for coredump. If the
current limit is 0, then we can return early without spending time
collecting coredump data.
2025-03-04 10:29:34 -05:00
David Yat Sin 3944da1d76 rocr:Only set asan flag on GPU agents 2025-03-03 14:51:19 -05:00
David Yat Sin 9a950ab788 rocr: Temporarily disable alternate scratch memory
Temporarily disable alternate scratch memory usage by default due to
some stability issues.
2025-03-03 09:27:29 -05:00
Khatri, Shweta 0984a1f0fd rocr: GFX9, GFX10, GFX11: Use view3dAs2dArray flag, for thick/3D swizzle modes. (#58)
A HSA_IMAGE_ENABLE_3D_SWIZZLE_DEBUG environment flag exists already to
enable/disable this. Default value is false (view3dAs2dArray = 1)
Enabling this flag will enable support for swizzles that do 3D
interleaving on GFX9, GF10 and GFX11. By default support for swizzles that
do 3D interleaving is disabled.
2025-02-26 09:38:17 -05:00
Tony Gutierrez d3a4dc9687 rocr: Remove KMT usage from AMD ext
Use the core Driver in AMD's HSA extension API to make it
agnostic to the underlying OS and kernel-mode driver.
2025-02-25 21:51:52 -05:00
Khatri, Shweta 322a794cf6 rocr: Adding support for Stochastic PC Sampling for gfx94x (#47)
Change-Id: Ide4c2e25b88f1f25ea4ce35a619b93963c0355ee
2025-02-22 00:13:08 -05:00
Tony Gutierrez a9f6bc8d0e rocr: Remove KMT usage from CPU agent
Use the core Driver object in the CPU agent to make it OS/driver
agnostic.

Implement the GetMemoryProperties() and GetCacheProperties methods
for the KFD driver.
2025-02-21 10:00:38 -05:00
David Yat Sin 107b48fb15 rocr: Add queries for async scratch reclaim
Add support for these 2 new queries:
- HSA_AMD_AGENT_INFO_SCRATCH_LIMIT_MAX
  Maximum amount of scratch memory allowed on this agent

- HSA_AMD_AGENT_INFO_SCRATCH_LIMIT_CURRENT
  Current limit for scratch memory on this agent
2025-02-19 21:02:00 -05:00
David Yat Sin aa2f98e6f9 rocr: Update for new async scratch reclaim
Updating ROCr code to match new handshake protocol with CP FW for
asynchronous scratch reclaim.
Increase previous limits when scratch reclaim feature is available.
2025-02-19 21:02:00 -05:00
David Yat Sin 2f8a9b28d0 rocr: Remove unused fields in amd_queue_t
scratch_wave64_lane_byte_size and alt_scratch_wave64_lane_byte_size are
not used by CP FW.
2025-02-19 21:02:00 -05:00
David Yat Sin 13c591d250 rocr: Remove gfx940 and gfx941 support 2025-02-19 12:16:24 -05:00
David Yat Sin fa8be44df9 rocr: Allow IPC signals in hsa_amd_signal_async_handler
Allow IPC signals to be registered with hsa_amd_signal_async_handler.
This forces AsyncEventsLoop to switch to polling instead of interrupts.
2025-02-19 11:19:09 -05:00
Adel Johar b4f8b5c202 Docs: Update environment variables page 2025-02-14 10:15:20 -05:00
Saleel Kudchadker 890399a7cf rocr: Skip uSleep for non-interrupt signals
- When waiting on non-interrupt signals, do not uSleep. This causes
  regressions compared to interrupt signal usage.
- Cleanup code.

Change-Id: I706bda0b13e64ffec0b607c1915d8380a2ce0dea
2025-02-06 23:48:35 -05:00
Luna Nova 166b08346b rocr: set underlying type of hsa_region
Set underlying type of hsa_region_info_t, hsa_amd_region_info_t
to int.

Change-Id: Ibf97a025eec6176d8e28af8009e9bd6795ca061f
2025-02-06 16:25:03 -05:00
sonadeem ff01f62777 cmake: Fix BUILD_SHARED_LIBS option and README for it
BUILD_SHARED_LIBS is a global flag so we don't need to set a default
option for it in both libhsakmt and hsa-runtime, only the top level
CMakeLists file. Also updated README to reflect that libhsakmt is
always built statically and gets linked to libhsa-runtime.

Change-Id: I1511f68a268032bec9758bc731d8074f33ec980f
2025-01-30 14:17:27 -05:00
Sv. Lockal 5d04bd42f3 Fix build issues for musl libc (#267)
Change-Id: Ia31330b0f96669966712b58986abeca754c2cbb9
2025-01-29 14:31:05 +00:00
Yiannis Papadopoulos 03bd4c9508 rocr: Changing to a device SVM flag
Change-Id: Ib085801d23604eeef0a17a05cf2b298170fb3d24
2025-01-28 17:06:16 +00:00
Yiannis Papadopoulos 144e7674d1 rocr: Use SVM information to separate dev heap
Use SVM information from user accessible memory

Change-Id: I8fad37eb1a90dc1f5827a096552130a3fd6187f4
2025-01-28 17:05:52 +00:00
Min Zhou a82f2f3134 rocr: delete duplicated conditional expression
Change-Id: Idc8b1a8ca2975f33191a448f03cabf3fc4f8f8a6
2025-01-28 10:48:44 -05:00
Yiannis Papadopoulos 1d8a77db34 rocr/aie: AIE agent memory pools correct size and user data pool
Change-Id: I831711a7d1cdc36cbc9ed30bd74d0dc984228ce7
2025-01-28 10:48:16 -05:00
Yiannis Papadopoulos 26bfa0b8f6 rocr/aie: Add dma-buf import support for AIEAgents via the Driver interface
Change-Id: I70f8d8772dda7c06944d75042cb3034ddd89aff4
2025-01-27 15:22:46 -05:00
Shweta Khatri 6361466baa rocr: Use view3dAs2dArray flag, for thick/3D swizzle modes.
Added HSA_IMAGE_ENABLE_3D_SWIZZLE_DEBUG environment flag to
enable/disable this. Default value is false (view3dAs2dArray = 1)
Enabling this flag will enable support for swizzles that do 3D
interleaving. Note that all features of 3D images are supported
with 2D swizzles,it's just that the access patterns are different
and therefore cache hit-rates may be better or worse, depending
on how it's used. Volumetric algorithms do better with 3D and apps
that tend to access a single slice at a time do better with 2D.

Change-Id: Id8574a6710fe4333a1ee331e5ce9195a81434198
2025-01-27 09:28:33 -05:00
Tony Gutierrez 8a38f121ea rocr: Add WaitMultiple to core Signal
Replaces WaitAny with WaitMultiple to more closely align with the
underlying driver API for waiting on multiple events.

WaitMultiple adds a single parameter, wait_on_all, to the WaitAny
interface providing a single function for waiting on multiple
events when we only need AND and OR semantics for the signal
checking logic.

Change-Id: I68a4a45d48151d9d69aef02fd8f7263b9e6c0e75
2025-01-27 09:21:43 -05:00
David Yat Sin dab8f2fc65 rocr: Add support for gfx950
<squashed with patch for gfx950 generic targets>

Signed-off-by: Chris Freehill <Chris.Freehill@amd.com>

Change-Id: Ifec6d93cf46c7fbf736c6572882299e279260af6
2025-01-26 13:04:58 -05:00
Ben Vanik 7d64fe49fa rocr: Fix HostQueue to obey the alignment requirement
Change-Id: I06542e9ff94e826ca0abba0328b301fec50a95ea
2025-01-24 12:08:11 -05:00
David Yat Sin 7ea25ebb85 rocr: Add thread priority for AsyncEventHandler
Set priority to maximum for signal event handler and minimum for
exceptions event handler.

Change-Id: I1b982d3c2e4c880fafc073fe1a542d01692a6fdc
2025-01-24 10:08:12 -05:00
Ben Vanik 9971e7b004 rocr: Fixing non-portable inline attribute on hsa_flag_* utilities.
Change-Id: Ie1c53fef407a71b5ec4c6eaf3a3ed00871184408
2025-01-23 15:09:21 -05:00
Tony Gutierrez 15107afb11 rocr: Generalize driver discovery
Generalize the driver discovery and move driver-specific
functionality to the concrete driver implementations.
Currently, this process is tightly coupled to the hsakmt
which is GPU and OS specific.

Change-Id: Ie1c53fef407a71b5ec4c6eaf3a3ed00871184409
2025-01-23 15:09:14 -05:00
Tony Gutierrez 77fa5af618 rocr: Make Open() and Close() virtual in Driver
Change-Id: Iac054c08383b080ca2b2ec6d65019bf2f083b763
2025-01-23 15:09:06 -05:00
Tony Gutierrez 8bbc44d51b rocr: Forward declare Driver in the Agent class
Change-Id: Ib27081bf31446af92602f723f352fb75ec3f378e
2025-01-23 15:08:59 -05:00
Longlong Yao 5d8fba133d rocr: add AMD_KERNEL_CODE_PROPERTIES_ENABLE_WAVEFRONT_SIZE32
Change-Id: I158705499f4ab0b1231d698d66902eb4ab1ececa
Signed-off-by: LonglongYao <Longlong.Yao@amd.com>
2025-01-22 13:02:31 -05:00
Swati Rawat 77c2a21a92 Update index.rst
Change-Id: I493e3dc3782608e4d0d712569a6e6fd3b376cdbe
2025-01-21 10:05:28 -05:00
Chris Freehill b1d6cacf79 rocr: Remove RuntimeCleanup and use of loaded()
The recent static initialization changes cause this clean up to
happen when it previously never did. The result of ~RuntimeCleanup()
being executed is that the static global "loaded_" is set to false,
which in turn prevents hsa_init() from executing again. Clean up
already happens when hsa_shut_down() occurs.

Change-Id: Ib5cefb80d82880c1945e04eb6ec246bc2c7d2324
2025-01-13 09:18:13 -05:00
Flora Cui 2cc279dbbc rocr: try DefaultSignal if interrupt is disabled
Reviewed-by: Shane Xiao <shane.xiao@amd.com>
Change-Id: I5d3a3813f56990f3aca61be23215faeb0a9629cb
Signed-off-by: Flora Cui <flora.cui@amd.com>
2025-01-02 11:09:20 +08:00
Shane Xiao 2d40493c31 rocr: Fix missed read lock in ExecutableImpl::FindHostAddress
Change-Id: Ide9b5cc3aa235d3768ebbfd8dc1560bf70fd0743
Signed-off-by: Shane Xiao <shane.xiao@amd.com>
Reviewed-by: Qiang Yu <qiang.yu@amd.com>
2024-12-30 06:43:25 -05:00
Tim Huang e515b0bca5 rocr: add ISA target support for GC version 11.5.3
This add support for GC version 11.5.3

Change-Id: I1d55e33198620d3493967558c25c636d5f7ab347
Signed-off-by: Tim Huang <tim.huang@amd.com>
2024-12-30 01:44:53 -05:00
Chris Freehill 67b0082443 rocr: Dynamically allocate IsaMap
This is to avoid use after free at the program's end, when statics
are destructed.

Change-Id: Id6bf26f25a58d13bdf1ee99c852adae8add76569
2024-12-20 09:20:09 -05:00
Flora Cui ac64c54d74 rocr: skip exception_signal_ handling on exit
if .supports_exception_debugging is not enabled.

Change-Id: I944fe7aa4f3068964f47e23f5259c3802d1e9556
Signed-off-by: Flora Cui <flora.cui@amd.com>
2024-12-19 04:14:32 -05:00
Apurv Mishra 699d0140be rocr: multiple uninitialized and unused variables
Minor modifications to multiple source and header
files based on Coverity report

Change-Id: I4a73d0f56640983c4d5124e13c8c280245cca672
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
2024-12-18 10:11:13 -05:00
Apurv Mishra 441bd9fe6c rocr: refactor of runtime.cpp based on Coverity
Add return checks, initialization and clean
redundant memory operations

fix 1: check return value of 'setsockopt' for error
fix 2: check return value of 'PtrInfo' for error
fix 3: move 'tool_names' instead of copying
fix 4: call 'munmap' for 'va' only once
fix 5: use 'ssize' for possible return values of -1 (err)
fix 6: add missing initialization in constructors
fix 7: add initialization for some scalars and pointers

Change-Id: I07d90e36d4e1fe48c4de4f44e18083e5ed4c5fbc
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
2024-12-18 10:06:55 -05:00
David Yat Sin 5da1889fb7 rocr: Avoid deadlock due to queue signal not updated
Make sure waiting_ count for queue signal is always > 0 so that we
always call hsaKmtWaitOnEvent to force hsaKmtWaitOnEvent to return.

Remove incorrect warning print when running in debug mode.

Call internal Signal::WaitAny instead of AMD::hsa_amd_signal_wait_any
to avoid extra function calls.

Change-Id: I9e41b704643e4e8ee7402b1379b1c30ff4c544ef
2024-12-16 10:25:19 -05:00