コミットグラフ

790 コミット

作成者 SHA1 メッセージ 日付
Konstantin Zhuravlyov 8a6edb07d9 Cache referenced symbol table when pulling data in relocation section
Change-Id: I6ef21cedde1aca6fd1ec5e5d5634563f030eaab8
2023-06-21 16:35:45 -04:00
Jonathan Kim 92467fd282 Prevent unnecessary SDMA queue creation on copy on status
Unless SDMA blits have actually been used for copies, prevent the DMA
copy status from querying the blit's pending byte status to avoid
creating an unnecessary HW queue.

Change-Id: Ied1fbed73c08f0408f0e3583f9b56f2768c71708
2023-06-21 03:10:53 -04:00
Jonathan Kim 8c60f04a99 Prevent blit copy pending bytes query when out of SDMA resources
Querying pending bytes on a blit kernel is unnecessary when runtime
runs out of SDMA resource since we are returning an SDMA availabilty
mask.

Change-Id: I347efba0c85b70ea3ba8749d76a499afc23909e8
2023-06-21 03:10:52 -04:00
Shweta Khatri 77bf357647 Defined a new extended scope memory region
Added HSA_AMD_MEMORY_POOL_GLOBAL_FLAG_EXT_SCOPE_FINE_GRAINED flag to enable extended scope memory region
where the device-scope atomics act as system-scope atomics

Change-Id: I79fc3207cb630dfc68bed2f8aabd75f35fe80b12
2023-06-20 11:00:05 -04:00
James Zhu 36666f5895 Enable sleep for all waiters
Enable sleep for all waiters with event age tracking support kernel.

Change-Id: Icd4e1e8d83b4a54e9f6aaa99691a6573211b3337
Signed-off-by: James Zhu <James.Zhu@amd.com>
2023-06-20 09:32:16 -04:00
James Zhu 5871b28503 Add kernel version flag supports event age
KFD kernel version 1.13 starts to support event age
tracking which help elimating unncessary busy wait.

Change-Id: Ib447ed6e0350f3110a4d6b9b80a0388000dd0e72
Signed-off-by: James Zhu <James.Zhu@amd.com>
2023-06-20 09:32:03 -04:00
Jonathan Kim 3e3e11bc5a Ensure HSA_ENABLE_SDMA=0 persists on new copy on engine API
Copy on engine API still needs to respect HSA_ENABLE_SDMA settings.

Change-Id: I26038b1e3082d62687c2e279615557583d20f229
2023-06-19 13:48:59 -04:00
raghavmedicherla 4142a77375 [hsa-runtime] Add support to hsa-runtime to find symbols from ".dynsym" section.
Earlier, hsa-runtime was unable to find symbols from a stripped ELF-image becasue
no support to find symbols from ".dynsym" section.

Looking for symbols in .dynsym is enabled by LOADER_USE_DYNSYM=1
environment variable

Change-Id: I4f0e8dd0eb053a6066d4d49b670c52e51149531a
2023-06-16 14:40:50 -04:00
David Yat Sin 5e4490f180 Update documentation for IPC handles
Explicitly mention that IPC handles can only be created on GPU agents.

Change-Id: I19bc3578d6e5243c795bf6fbf981ea4bd3bfc2e8
2023-06-14 16:21:26 -04:00
Jonathan Kim bfb94b3b6e Soften trap handler loading failure when exception handling not supported
GFX11 and up including some GFX9 devices will not support
old trap handling without the new exception handling.

Instead of a hard assert failure that runs into a core dump,
let ROCr initialization continue instead.

Change-Id: I309becdc72ef4fb2fafd118c1faf0801407e658e
2023-06-13 13:05:47 -04:00
Laurent Morichetti 6a82b0a038 Fix a race condition in the trap handler
status.priv may be read after returning from the trap handler, which
causes sq_interrupt_word_wave.priv to be 0 even though the s_sendmsg
instruction was initiated when status.priv was 1.

To work around this, added a s_waitcnt lgkmcnt(0) after s_sendmsg
to make sure the message is sent before continuing.

Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Laurent Morichetti <Laurent.Morichetti@amd.com>
Change-Id: Ieb75005ca1559ef03d0efac80e966f521e41fcb7
2023-06-09 10:03:55 -04:00
Ammar ELWazir fc603d58d2 : Adding support to UMC & MMEA System Blocks
Change-Id: I92601f37757e0cff3f1fdc10f2e5e0db51c1ee2d
2023-06-08 21:22:19 +00:00
Jonathan Kim 233413eb08 Remove Tab Indent on SDMA Status Fix
Use spaces not tabs.

Change-Id: Icaeb16158ebaddd8e5ac518103d285d55fe976f3
2023-06-07 16:47:04 -04:00
Xiaomeng Hou 389cd3564b Do not reserve scratch memory on asic with finite vram resource
Change-Id: I0a2207cb01f464ed3e73331637cfa9bd62f03d97
2023-06-06 22:01:31 +08:00
David Yat Sin e4fffa140a Removing __linux__ definition in CMake
Removing this definition as this should already be defined by compiler.
This is causing compile errors on newer versions of llvm because the
macro is being redefined.

Change-Id: Ica6a06f46a14e16d3f52e83b9b5ee8cfd7359510
2023-06-05 12:23:56 -04:00
Xiaomeng Hou 557da77c4e Correct the SDMA engine mask reported on apu
There is only one SDMA instance on small APUs.

Change-Id: I9d4dda511c40fc78f002be720e5f1909dc5b91e4
2023-06-02 19:10:08 +08:00
David Yat Sin fc3b554121 Change failure to parse CPUID to warning
Change-Id: If42dbcd11ac1be09597e43a8f11caa91cf37903e
2023-05-31 11:46:52 -04:00
David Yat Sin b290d65ec9 Bump interface versions due to hsa_amd_memory_async_copy_on_engine added
Change-Id: Iff36719e800280d58217647bb70d3b5d5fcc91fe
2023-05-26 12:04:06 +00:00
David Yat Sin 41f6d0426d Adding gfx941 and gfx942
Adding support for gfx941 and gfx942 ISAs.
gfx940 ISA will use sc0:1 sc1:1 on load/store operations
gfx942 ISA will use default load/store operations

Change-Id: If1efbef86f59e2cf2d48fe359cd4166405a0a579
2023-05-23 11:13:16 -04:00
David Yat Sin 50e754d08b ASAN: Remap first page of allocations to host mem
When compiling in ASAN mode, remap the first page of device allocations
to system memory. ASAN's memory allocator uses a small amount of extra
memory to store data for housekeeping purpose. But because this memory
is from the GPU memory pool, it might have uncommon memory type for host
to access. Mapping this section of memory to the host makes this memory
accessible to ASAN.

Change-Id: I36f659d616a4d15558372592439a8723c5c84a69
Signed-off-by: Bing Ma <Bing.Ma@amd.com>
2023-05-22 20:58:54 -04:00
David Yat Sin a1f3b619a7 Add mutex when reserving scratch
This prevents race condition when creating queues concurrently.

Change-Id: I5ea9714926fe06e1719fcb2559cb485063355e4f
2023-05-19 11:05:13 -04:00
David Yat Sin a397373cea Add HSA_ENABLE_PEER_SDMA env variable
Add support for HSA_ENABLE_PEER_SDMA env variable that can be used to
disable use of SDMA engines for device-to-device transfers. Note that
setting HSA_ENABLE_SDMA=0 will disable all SDMA transfers and override
HSA_ENABLE_PEER_SDMA values.

Change-Id: I737b3c2b2efcf3ff237f98bc748f49b8252ed24a
2023-05-18 00:10:20 +00:00
Ranjith Ramakrishnan ad002f1e7b Use the RUNPATH provided by build scripts
RUNPATH in libraries will be : $ORIGIN
RUNPATH in binaries will be : $ORIGIN/../lib

Change-Id: Iafa66a8e02cc8c5783903d40927b63652042d2f1
2023-05-17 09:10:50 -04:00
David Yat Sin 39feb83b88 Update documentation for hsa_amd_pointer_info
Update documentation for hsa_amd_pointer_info to clarify which fields
are invalid when the allocation type is HSA_EXT_POINTER_TYPE_UNKNOWN.

Change-Id: Idaed985962c4a98d281ebe01bef8ec2459da3985
2023-05-16 18:36:54 -04:00
David Yat Sin 38e832a682 Reserve scratch on first queue allocation
Some workloads running on multi-GPU create 1 process per GPU. So each
process creates a GPU agent on every GPU, but will only create queues on
one GPU. This would cause un-necessary scratch reservation.

Change-Id: I50a216f0bcc0b5f707f3943147390b0ecec1ac22
2023-05-15 17:10:57 -04:00
Graham Sider bd63e5045c Fix scratch allocation occupancy reduction loop
If the required scratch allocation is too large, ROCr will attempt to
reduce it by lowering the dispatch's targeted occupancy. The reduction
loop however was prone to overflow if waves_per_cu was not a multiple of
waves_per_group. Ensure no overflow by aligning waves_per_cu to
waves_per_group.

On GC 9.4.3 dGPU, dispatches with a large grid size and a
waves_per_group of e.g. 16 may require to reduce occupancy such that
waves_per_cu is less than waves_per_group to ensure the allocation size
is small enough. Allow this while also ensuring the tmpring scratch wave
count is kept divisible by the number of SEs per XCC.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Ie4016dcd8166a9ae69e9decc26a3eec882b49480
2023-05-15 14:55:42 +00:00
David Yat Sin 3477fbc661 Do not report reserved scratch cache as available
Scratch cache reserved memory is only available for scratch memory use
so do not report this memory as available to the user via the
HSA_AMD_AGENT_INFO_MEMORY_AVAIL api.

Change-Id: I52f96e62536458bcaa52b9f4be5de856d5680dc4
2023-05-15 09:45:31 -04:00
David Yat Sin f0000da7b3 Removing invalid gfx entries
Change-Id: I1a9a9a064f5f65ecc3e124c5dd7d6baf6b5ccb5c
2023-05-12 11:59:27 -04:00
Saleel Kudchadker adf6512dad Report XGMI SDMA upon query
Report XGMI SDMA engines when queried for H2D/D2H.

Change-Id: I4fb7b24bc15d1745b3844485bdeab71282a787a5
2023-05-11 12:20:41 -04:00
David Yat Sin 9b35ce5b3b Fix incorrect check for image support
Change-Id: I77476204d40c245c9d9091853264a4e9fbb80725
2023-05-10 20:13:54 +00:00
Ranjith Ramakrishnan fbcbcd9e73 Set the default value of ROCM_HEADER_WRAPPER_WERROR to OFF
Using wrapper header files will result in #warning message by default

Change-Id: I87739cabb365b9370b1182cf23ca9b54d99149c3
2023-05-10 00:47:33 -04:00
Sam Wu 57b3fcde51 add sphinx configurations
Change-Id: I1a66a02b18fb699415a87a6473eb72c097a13b5f
2023-05-08 15:58:01 -06:00
David Yat Sin a180c9ee78 Add env var to override SRAM ECC
Add HSA_ENABLE_SRAMECC environment variable that can be used to
override SRAM ECC mode reported by KFD

Change-Id: I2b95511820a2d3d146a76b03070659c0695b61fd
2023-04-27 16:16:05 -04:00
David Yat Sin f024d21e3d Add query for number of XCCs per agent
Change-Id: I4b694b4904ba0326c998356388a62c19a972a7ff
2023-04-27 16:15:59 -04:00
Mike Li 46b667e530 Return failure with any IMAGE attribute for gfx940
The gfx940 does not support IMAGE instructions. Any get_info with
IMAGE attributes should return failure.

Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
Change-Id: I12005628f92780f551ab6f8b41526c66b54c6a59
2023-04-27 16:15:51 -04:00
Mike Li 9554e95de0 Scratch memory changes to support multi-xcc
Change-Id: I115ba4cfe250c59cb7421217cfe0fad6302f25b3
2023-04-27 16:15:30 -04:00
Laurent Morichetti f31b312611 Update the trap handler for gfx940
gfx940 uses ttmp11 to hold the queue packet index so the first level
trap handler uses ttmp13 instead to save ib_sts.

Repurpose ttmp11[31] to mean that the ttmps are initialized. The issue
was that the debugger could not tell whether ttmp6 was written by the
trap handler when determining the stop reason.

If ttmp11[31]=0, then the trap handler has not been executed and ttmp6
should be assumed to be 0.  If ttmp11[31]=1, then ttmp6 holds the
trap_id, if an s_trap instruction caused the exception.

Signed-off-by: Laurent Morichetti <laurent.morichetti@amd.com>
Signed-off-by: Lancelot Six <lancelot.six@amd.com>

Change-Id: I9af903abae044b9ec530306229caf3b883f3ee46
2023-04-27 16:15:14 -04:00
Mike Li de4d1ce424 Add gfx940 to AmdHsaCode
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
Change-Id: Ib4f7c801c3d3bac9a04c880c5bf86b72bfa3404f
2023-04-27 16:09:26 -04:00
Mike Li bd98a1e5bf Added gfx940 ISA
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
Change-Id: Icb1830fe186abc69fe7ee709b7f12b882cab9e87
2023-04-27 16:08:58 -04:00
Alex Sierra e82025bffa use mkstemp instead tempnam for temp file
tempnam has been marked as obsolete.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: Ie64d9a351bf386da00a96ceff059f685e11f2cca
2023-04-17 15:38:59 -04:00
Lancelot SIX 183f5d90aa linux os_thread: improve error handling
On Linux, the os_thread abstraction is built on top of pthread.  Many of
the pthread calls might fail and return error codes.  The error
conditions are only checked via assertions (if ever checked) which means
that when doing a release build, no error condition is checked.  The
same goes for dlsym/dlinfo and clock_gettime.

This commit improves the situation this by checking the error conditions
and acting accordingly.  When the error condition is detected in a
function with a mean to indicate some error to its caller, then this
patch prints some error message and returns.  If there is no way to
propagate the error up the call stack, print some error message and
abort the process.

For the os_info::os_info ctor, the only user is CreateThread, which
checks that the built thread is Valid().  If not, nullptr is returned to
the caller.

It could be possible to use exceptions when functions cannot pass
errors, but for now I only use abort as it is what abort would do with
debug build.

Change-Id: I815703c3b95777cc29bb89a7d654ac879c14a759
2023-04-17 09:48:11 -04:00
Lancelot SIX 72219b8237 Runtime::GetSystemInfo: Supress parentheses warning
When building with g++-11.3.0, I have the following warning:

    /home/.../core/runtime/runtime.cpp: In member function ‘hsa_status_t rocr::core::Runtime::GetSystemInfo(hsa_system_info_t, void*)’:
    /home/.../core/runtime/runtime.cpp:693:56: warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses]
      693 |           kfd_version.KernelInterfaceMajorVersion == 1 &&
          |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
      694 |               kfd_version.KernelInterfaceMinorVersion >= 12)
          |               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This patch adds the parenthesis as suggested.  This silences the
compiler warning.

No functional change expected.

Change-Id: I69c1a73a432b0f2393dbaf36d4424cf0056c535f
2023-04-17 09:43:02 -04:00
David Yat Sin f43a284b8e Change error reported when receiving code 128
We used to report HSA_STATUS_ERROR_INVALID_ISA when receiving error code
128, but there are several other reasons why we could be exceeding
number of VGPRs, so updating the error code.

Change-Id: I6a6980d5b07b09c93d00dee5207a0d52399bc77e
2023-04-14 09:12:07 -04:00
David Yat Sin 511855d344 Fix assertion when _GLIBCXX_ASSERTIONS is enabled
One some platforms, e.g Arch Linux, -D_GLIBCXX_ASSERTIONS compile flag
is enabled by default, causing a runtime assertion.
Avoid assertion by using std::vector accessor function data().

Change-Id: I118cdf102c3e353f32c618823e363ee1059f3453
2023-04-11 11:40:10 +00:00
David Yat Sin c5bf7eb112 Fix for overwriting pointer info size
Fix for overwriting pointer info size provided by caller of
hsa_amd_pointer_info.

Change-Id: I2e5d73ab9ba1a32bc9b4d112bc29b4a99fd8b3b5
2023-04-06 16:35:37 -04:00
David Yat Sin 8ebf5f9c48 Adding scratch memory reservation
Some applications will keep trying to allocate device memory until the
allocation fails. This causes all device memory to be used up and we are
then unable to allocate scratch memory for dispatches. Reserve enough
memory for 1 small scratch allocation.

Change-Id: I968400d41540ba1aca8f28581f229693eec02225
2023-04-06 15:13:36 +00:00
Konstantin Zhuravlyov a5932ef5ef Loader: Skip vdso.so code objects in GetUriFromMemoryInExecutableFile
Change-Id: Ie2cac880c406ed90d6fa614707fa8df7b87458da
2023-03-17 09:57:15 -04:00
Lang Yu aec7200cb2 Switch to completion signal wait for amd_aql_pm4_ib processing
Wait on completion signal for amd_aql_pm4_ib processing
on ASICs with gfx version >= 9.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Change-Id: Ia704d9cc5b2535dcf8564a30f694262b113f77a2
2023-03-16 20:23:53 -04:00
Jonathan Kim fc8f3f9fd5 Fix Invalid Engine Offset Check
Engine offset that is the maximum number of engines is still valid
as offset enum 0 is occupied by blit copies so raise the limit by 1.

Change-Id: I6fcab106290e6647702efe297a4281861da4e0b8
2023-03-16 09:50:10 -04:00
Shweta Khatri 83a307c449 By default, disable mwaitx feature.
This can be enabled by setting HSA_ENABLE_MWAITX=1

Change-Id: I4be00892780beeb8b14c3c5f34aa10b158921bff
2023-03-15 19:57:25 -04:00