Граф коммитов

996 Коммитов

Автор SHA1 Сообщение Дата
Sreekant Somasekharan bf22d10ceb rocrtst: Fix RoundToPowerOf2 function
Compiler behavior is undefined if the right operand is negative,
or greater than or equal to the width of the promoted left operand.
For release builds with address sanitizer enabled, this compiler
optimization behavior leads to unsupported queue size value since
current method shifts till 128 bits on a 64 bit value.

Change-Id: Iddcc15b43d2331bc8bf5fc3aa4725f76844655ec
Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com>


[ROCm/ROCR-Runtime commit: ea2f832a43]
2023-06-19 19:17:49 -04:00
Jonathan Kim 63463b14c3 Ensure HSA_ENABLE_SDMA=0 persists on new copy on engine API
Copy on engine API still needs to respect HSA_ENABLE_SDMA settings.

Change-Id: I26038b1e3082d62687c2e279615557583d20f229


[ROCm/ROCR-Runtime commit: 3e3e11bc5a]
2023-06-19 13:48:59 -04:00
raghavmedicherla 2758da98cd [hsa-runtime] Add support to hsa-runtime to find symbols from ".dynsym" section.
Earlier, hsa-runtime was unable to find symbols from a stripped ELF-image becasue
no support to find symbols from ".dynsym" section.

Looking for symbols in .dynsym is enabled by LOADER_USE_DYNSYM=1
environment variable

Change-Id: I4f0e8dd0eb053a6066d4d49b670c52e51149531a


[ROCm/ROCR-Runtime commit: 4142a77375]
2023-06-16 14:40:50 -04:00
David Yat Sin 8c3acb3974 Update documentation for IPC handles
Explicitly mention that IPC handles can only be created on GPU agents.

Change-Id: I19bc3578d6e5243c795bf6fbf981ea4bd3bfc2e8


[ROCm/ROCR-Runtime commit: 5e4490f180]
2023-06-14 16:21:26 -04:00
Jonathan Kim 1772d866c9 Soften trap handler loading failure when exception handling not supported
GFX11 and up including some GFX9 devices will not support
old trap handling without the new exception handling.

Instead of a hard assert failure that runs into a core dump,
let ROCr initialization continue instead.

Change-Id: I309becdc72ef4fb2fafd118c1faf0801407e658e


[ROCm/ROCR-Runtime commit: bfb94b3b6e]
2023-06-13 13:05:47 -04:00
Laurent Morichetti 3736a0ffeb Fix a race condition in the trap handler
status.priv may be read after returning from the trap handler, which
causes sq_interrupt_word_wave.priv to be 0 even though the s_sendmsg
instruction was initiated when status.priv was 1.

To work around this, added a s_waitcnt lgkmcnt(0) after s_sendmsg
to make sure the message is sent before continuing.

Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Laurent Morichetti <Laurent.Morichetti@amd.com>
Change-Id: Ieb75005ca1559ef03d0efac80e966f521e41fcb7


[ROCm/ROCR-Runtime commit: 6a82b0a038]
2023-06-09 10:03:55 -04:00
Ammar ELWazir 5675ed837a : Adding support to UMC & MMEA System Blocks
Change-Id: I92601f37757e0cff3f1fdc10f2e5e0db51c1ee2d


[ROCm/ROCR-Runtime commit: fc603d58d2]
2023-06-08 21:22:19 +00:00
Jonathan Kim 21f24c1348 Remove Tab Indent on SDMA Status Fix
Use spaces not tabs.

Change-Id: Icaeb16158ebaddd8e5ac518103d285d55fe976f3


[ROCm/ROCR-Runtime commit: 233413eb08]
2023-06-07 16:47:04 -04:00
Xiaomeng Hou 99d3d2afbd Do not reserve scratch memory on asic with finite vram resource
Change-Id: I0a2207cb01f464ed3e73331637cfa9bd62f03d97


[ROCm/ROCR-Runtime commit: 389cd3564b]
2023-06-06 22:01:31 +08:00
David Yat Sin c83eee3f2b Removing __linux__ definition in CMake
Removing this definition as this should already be defined by compiler.
This is causing compile errors on newer versions of llvm because the
macro is being redefined.

Change-Id: Ica6a06f46a14e16d3f52e83b9b5ee8cfd7359510


[ROCm/ROCR-Runtime commit: e4fffa140a]
2023-06-05 12:23:56 -04:00
Graham Sider 5ec7dcd4c4 Revert "Disable Queue_Validation_InvalidGroupMemory"
This reverts commit 7a157d8e55.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I8424c96d5e5c3c9a9e7711ecff7c5372190b0d2d


[ROCm/ROCR-Runtime commit: e2c3c3e510]
2023-06-05 09:41:02 -04:00
Graham Sider 74f9ba24e0 rocrtst: Remove extra clear_code_object() calls
A patch was made in gfx940 npi branch to move the kernel object file
loading to outside the rocrtstNeg.Queue_Validation_* main queue creation
and submission loops, and added a clear_code_object() after the loop.

Another patch was made to the non-npi branch which adds a
clear_code_object() inside the loop. When the npi branch patch was
merged, this was causing the code object to be cleared at the end of
the first loop. Remove these clear_code_object() calls.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Id4188e78411e81c5071bf715c1f02491f571ab79


[ROCm/ROCR-Runtime commit: dbe2a82e35]
2023-06-05 09:41:02 -04:00
Xiaomeng Hou 381ea164ba Correct the SDMA engine mask reported on apu
There is only one SDMA instance on small APUs.

Change-Id: I9d4dda511c40fc78f002be720e5f1909dc5b91e4


[ROCm/ROCR-Runtime commit: 557da77c4e]
2023-06-02 19:10:08 +08:00
David Yat Sin 9c54cdaaf1 Change failure to parse CPUID to warning
Change-Id: If42dbcd11ac1be09597e43a8f11caa91cf37903e


[ROCm/ROCR-Runtime commit: fc3b554121]
2023-05-31 11:46:52 -04:00
David Yat Sin 3661d76c74 Bump interface versions due to hsa_amd_memory_async_copy_on_engine added
Change-Id: Iff36719e800280d58217647bb70d3b5d5fcc91fe


[ROCm/ROCR-Runtime commit: b290d65ec9]
2023-05-26 12:04:06 +00:00
Graham Sider f0eeb60222 rocrtst: Throw on LocateKernelFile open() failures
Throw runtime error instead of returning empty string when open() fails
in LocateKernelFile()

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Iafa360fbc2d3c9b01b9fe7ea4c11d70bd254ccce


[ROCm/ROCR-Runtime commit: 0772e8d618]
2023-05-24 14:31:26 -04:00
David Yat Sin 3345ada378 Adding gfx941 and gfx942
Adding support for gfx941 and gfx942 ISAs.
gfx940 ISA will use sc0:1 sc1:1 on load/store operations
gfx942 ISA will use default load/store operations

Change-Id: If1efbef86f59e2cf2d48fe359cd4166405a0a579


[ROCm/ROCR-Runtime commit: 41f6d0426d]
2023-05-23 11:13:16 -04:00
David Yat Sin 959c897604 ASAN: Remap first page of allocations to host mem
When compiling in ASAN mode, remap the first page of device allocations
to system memory. ASAN's memory allocator uses a small amount of extra
memory to store data for housekeeping purpose. But because this memory
is from the GPU memory pool, it might have uncommon memory type for host
to access. Mapping this section of memory to the host makes this memory
accessible to ASAN.

Change-Id: I36f659d616a4d15558372592439a8723c5c84a69
Signed-off-by: Bing Ma <Bing.Ma@amd.com>


[ROCm/ROCR-Runtime commit: 50e754d08b]
2023-05-22 20:58:54 -04:00
David Yat Sin 255a645c3b Add mutex when reserving scratch
This prevents race condition when creating queues concurrently.

Change-Id: I5ea9714926fe06e1719fcb2559cb485063355e4f


[ROCm/ROCR-Runtime commit: a1f3b619a7]
2023-05-19 11:05:13 -04:00
David Yat Sin 14052ab9d0 Add HSA_ENABLE_PEER_SDMA env variable
Add support for HSA_ENABLE_PEER_SDMA env variable that can be used to
disable use of SDMA engines for device-to-device transfers. Note that
setting HSA_ENABLE_SDMA=0 will disable all SDMA transfers and override
HSA_ENABLE_PEER_SDMA values.

Change-Id: I737b3c2b2efcf3ff237f98bc748f49b8252ed24a


[ROCm/ROCR-Runtime commit: a397373cea]
2023-05-18 00:10:20 +00:00
Ranjith Ramakrishnan 82b4216e40 Use the RUNPATH provided by build scripts
RUNPATH in libraries will be : $ORIGIN
RUNPATH in binaries will be : $ORIGIN/../lib

Change-Id: Iafa66a8e02cc8c5783903d40927b63652042d2f1


[ROCm/ROCR-Runtime commit: ad002f1e7b]
2023-05-17 09:10:50 -04:00
David Yat Sin b8e97a8d1b Update documentation for hsa_amd_pointer_info
Update documentation for hsa_amd_pointer_info to clarify which fields
are invalid when the allocation type is HSA_EXT_POINTER_TYPE_UNKNOWN.

Change-Id: Idaed985962c4a98d281ebe01bef8ec2459da3985


[ROCm/ROCR-Runtime commit: 39feb83b88]
2023-05-16 18:36:54 -04:00
David Yat Sin 7ecdefb7ca Reserve scratch on first queue allocation
Some workloads running on multi-GPU create 1 process per GPU. So each
process creates a GPU agent on every GPU, but will only create queues on
one GPU. This would cause un-necessary scratch reservation.

Change-Id: I50a216f0bcc0b5f707f3943147390b0ecec1ac22


[ROCm/ROCR-Runtime commit: 38e832a682]
2023-05-15 17:10:57 -04:00
Graham Sider 53b5692d07 Fix scratch allocation occupancy reduction loop
If the required scratch allocation is too large, ROCr will attempt to
reduce it by lowering the dispatch's targeted occupancy. The reduction
loop however was prone to overflow if waves_per_cu was not a multiple of
waves_per_group. Ensure no overflow by aligning waves_per_cu to
waves_per_group.

On GC 9.4.3 dGPU, dispatches with a large grid size and a
waves_per_group of e.g. 16 may require to reduce occupancy such that
waves_per_cu is less than waves_per_group to ensure the allocation size
is small enough. Allow this while also ensuring the tmpring scratch wave
count is kept divisible by the number of SEs per XCC.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Ie4016dcd8166a9ae69e9decc26a3eec882b49480


[ROCm/ROCR-Runtime commit: bd63e5045c]
2023-05-15 14:55:42 +00:00
David Yat Sin 2d924e337d Do not report reserved scratch cache as available
Scratch cache reserved memory is only available for scratch memory use
so do not report this memory as available to the user via the
HSA_AMD_AGENT_INFO_MEMORY_AVAIL api.

Change-Id: I52f96e62536458bcaa52b9f4be5de856d5680dc4


[ROCm/ROCR-Runtime commit: 3477fbc661]
2023-05-15 09:45:31 -04:00
David Yat Sin e1ded285a9 Removing invalid gfx entries
Change-Id: I1a9a9a064f5f65ecc3e124c5dd7d6baf6b5ccb5c


[ROCm/ROCR-Runtime commit: f0000da7b3]
2023-05-12 11:59:27 -04:00
David Yat Sin 7a157d8e55 Disable Queue_Validation_InvalidGroupMemory
Temporarily disabling rocrtstNeg.Queue_Validation_InvalidGroupMemory
until it is fixed.

Change-Id: Ifc1973a960c8d0bae27e2628e4bfddc60f70325d


[ROCm/ROCR-Runtime commit: 7b74271d5e]
2023-05-12 11:03:26 -04:00
Saleel Kudchadker 5630103f4a Report XGMI SDMA upon query
Report XGMI SDMA engines when queried for H2D/D2H.

Change-Id: I4fb7b24bc15d1745b3844485bdeab71282a787a5


[ROCm/ROCR-Runtime commit: adf6512dad]
2023-05-11 12:20:41 -04:00
David Yat Sin 35e72e3d97 Fix incorrect check for image support
Change-Id: I77476204d40c245c9d9091853264a4e9fbb80725


[ROCm/ROCR-Runtime commit: 9b35ce5b3b]
2023-05-10 20:13:54 +00:00
Ranjith Ramakrishnan dd9fdba22c Set the default value of ROCM_HEADER_WRAPPER_WERROR to OFF
Using wrapper header files will result in #warning message by default

Change-Id: I87739cabb365b9370b1182cf23ca9b54d99149c3


[ROCm/ROCR-Runtime commit: fbcbcd9e73]
2023-05-10 00:47:33 -04:00
Sam Wu 56ec0e6412 add sphinx configurations
Change-Id: I1a66a02b18fb699415a87a6473eb72c097a13b5f


[ROCm/ROCR-Runtime commit: 57b3fcde51]
2023-05-08 15:58:01 -06:00
Graham Sider e2fc46c189 rocrtst: Move kernel object loading outside of loops
Negative queue validation tests were doing many redundant from-file
kernel object loads in a loop. This was creating many simulataneous open
file handles within many dynamically allocated CodeObject objects. While
the CodeObject class implements RAII on the file handles to cleanup on
destruction, clear_code_object() only gets called on the destruction of
the TestBase-derived test objects (these being a suite abstraction).

Due to this we were hitting file open() EMFILE errors (too many open
files) in gfx94x CPX mode. Move LoadKernelFromObjFile outside of the
test loops and clear_code_object() for each test on each agent.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I6f9d23fd122720c49a58c22698f097906d2fc97c


[ROCm/ROCR-Runtime commit: 7a4c9273d7]
2023-04-27 16:16:12 -04:00
David Yat Sin 11541cc283 Add env var to override SRAM ECC
Add HSA_ENABLE_SRAMECC environment variable that can be used to
override SRAM ECC mode reported by KFD

Change-Id: I2b95511820a2d3d146a76b03070659c0695b61fd


[ROCm/ROCR-Runtime commit: a180c9ee78]
2023-04-27 16:16:05 -04:00
David Yat Sin 101755c207 Add query for number of XCCs per agent
Change-Id: I4b694b4904ba0326c998356388a62c19a972a7ff


[ROCm/ROCR-Runtime commit: f024d21e3d]
2023-04-27 16:15:59 -04:00
Mike Li 15114271be Return failure with any IMAGE attribute for gfx940
The gfx940 does not support IMAGE instructions. Any get_info with
IMAGE attributes should return failure.

Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
Change-Id: I12005628f92780f551ab6f8b41526c66b54c6a59


[ROCm/ROCR-Runtime commit: 46b667e530]
2023-04-27 16:15:51 -04:00
Mike Li e495a4b16a Do not use the function part of the location_id
The function IDs used to be 0 on previous asics but on gfx94x and newer
asics, these bits are set. These bits are used by user applications to
uniquely identify the locations of GPU nodes. These exta bits break
hwloc and are not needed for rocrtst.

Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Change-Id: I1202f504645b0662d009b9c0926eebb7ddc08d73


[ROCm/ROCR-Runtime commit: d7fa654338]
2023-04-27 16:15:43 -04:00
Mike Li dae51188d8 Scratch memory changes to support multi-xcc
Change-Id: I115ba4cfe250c59cb7421217cfe0fad6302f25b3


[ROCm/ROCR-Runtime commit: 9554e95de0]
2023-04-27 16:15:30 -04:00
Laurent Morichetti 3603303bc7 Update the trap handler for gfx940
gfx940 uses ttmp11 to hold the queue packet index so the first level
trap handler uses ttmp13 instead to save ib_sts.

Repurpose ttmp11[31] to mean that the ttmps are initialized. The issue
was that the debugger could not tell whether ttmp6 was written by the
trap handler when determining the stop reason.

If ttmp11[31]=0, then the trap handler has not been executed and ttmp6
should be assumed to be 0.  If ttmp11[31]=1, then ttmp6 holds the
trap_id, if an s_trap instruction caused the exception.

Signed-off-by: Laurent Morichetti <laurent.morichetti@amd.com>
Signed-off-by: Lancelot Six <lancelot.six@amd.com>

Change-Id: I9af903abae044b9ec530306229caf3b883f3ee46


[ROCm/ROCR-Runtime commit: f31b312611]
2023-04-27 16:15:14 -04:00
Mike Li 547d2aa3c8 Add gfx940 to AmdHsaCode
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
Change-Id: Ib4f7c801c3d3bac9a04c880c5bf86b72bfa3404f


[ROCm/ROCR-Runtime commit: de4d1ce424]
2023-04-27 16:09:26 -04:00
Mike Li fe9b01e916 Added gfx940 ISA
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
Change-Id: Icb1830fe186abc69fe7ee709b7f12b882cab9e87


[ROCm/ROCR-Runtime commit: bd98a1e5bf]
2023-04-27 16:08:58 -04:00
Alex Sierra bd8c4079da use mkstemp instead tempnam for temp file
tempnam has been marked as obsolete.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: Ie64d9a351bf386da00a96ceff059f685e11f2cca


[ROCm/ROCR-Runtime commit: e82025bffa]
2023-04-17 15:38:59 -04:00
Lancelot SIX 5313f40ae2 linux os_thread: improve error handling
On Linux, the os_thread abstraction is built on top of pthread.  Many of
the pthread calls might fail and return error codes.  The error
conditions are only checked via assertions (if ever checked) which means
that when doing a release build, no error condition is checked.  The
same goes for dlsym/dlinfo and clock_gettime.

This commit improves the situation this by checking the error conditions
and acting accordingly.  When the error condition is detected in a
function with a mean to indicate some error to its caller, then this
patch prints some error message and returns.  If there is no way to
propagate the error up the call stack, print some error message and
abort the process.

For the os_info::os_info ctor, the only user is CreateThread, which
checks that the built thread is Valid().  If not, nullptr is returned to
the caller.

It could be possible to use exceptions when functions cannot pass
errors, but for now I only use abort as it is what abort would do with
debug build.

Change-Id: I815703c3b95777cc29bb89a7d654ac879c14a759


[ROCm/ROCR-Runtime commit: 183f5d90aa]
2023-04-17 09:48:11 -04:00
Lancelot SIX 68167b62ba Runtime::GetSystemInfo: Supress parentheses warning
When building with g++-11.3.0, I have the following warning:

    /home/.../core/runtime/runtime.cpp: In member function ‘hsa_status_t rocr::core::Runtime::GetSystemInfo(hsa_system_info_t, void*)’:
    /home/.../core/runtime/runtime.cpp:693:56: warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses]
      693 |           kfd_version.KernelInterfaceMajorVersion == 1 &&
          |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
      694 |               kfd_version.KernelInterfaceMinorVersion >= 12)
          |               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This patch adds the parenthesis as suggested.  This silences the
compiler warning.

No functional change expected.

Change-Id: I69c1a73a432b0f2393dbaf36d4424cf0056c535f


[ROCm/ROCR-Runtime commit: 72219b8237]
2023-04-17 09:43:02 -04:00
David Yat Sin 6812573d06 Change error reported when receiving code 128
We used to report HSA_STATUS_ERROR_INVALID_ISA when receiving error code
128, but there are several other reasons why we could be exceeding
number of VGPRs, so updating the error code.

Change-Id: I6a6980d5b07b09c93d00dee5207a0d52399bc77e


[ROCm/ROCR-Runtime commit: f43a284b8e]
2023-04-14 09:12:07 -04:00
David Yat Sin 6c4528ba33 Fix assertion when _GLIBCXX_ASSERTIONS is enabled
One some platforms, e.g Arch Linux, -D_GLIBCXX_ASSERTIONS compile flag
is enabled by default, causing a runtime assertion.
Avoid assertion by using std::vector accessor function data().

Change-Id: I118cdf102c3e353f32c618823e363ee1059f3453


[ROCm/ROCR-Runtime commit: 511855d344]
2023-04-11 11:40:10 +00:00
David Yat Sin f84f83702c Fix for overwriting pointer info size
Fix for overwriting pointer info size provided by caller of
hsa_amd_pointer_info.

Change-Id: I2e5d73ab9ba1a32bc9b4d112bc29b4a99fd8b3b5


[ROCm/ROCR-Runtime commit: c5bf7eb112]
2023-04-06 16:35:37 -04:00
David Yat Sin d476ff16eb Adding scratch memory reservation
Some applications will keep trying to allocate device memory until the
allocation fails. This causes all device memory to be used up and we are
then unable to allocate scratch memory for dispatches. Reserve enough
memory for 1 small scratch allocation.

Change-Id: I968400d41540ba1aca8f28581f229693eec02225


[ROCm/ROCR-Runtime commit: 8ebf5f9c48]
2023-04-06 15:13:36 +00:00
Konstantin Zhuravlyov 536f0aa118 Loader: Skip vdso.so code objects in GetUriFromMemoryInExecutableFile
Change-Id: Ie2cac880c406ed90d6fa614707fa8df7b87458da


[ROCm/ROCR-Runtime commit: a5932ef5ef]
2023-03-17 09:57:15 -04:00
Lang Yu 44b940e033 Switch to completion signal wait for amd_aql_pm4_ib processing
Wait on completion signal for amd_aql_pm4_ib processing
on ASICs with gfx version >= 9.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Change-Id: Ia704d9cc5b2535dcf8564a30f694262b113f77a2


[ROCm/ROCR-Runtime commit: aec7200cb2]
2023-03-16 20:23:53 -04:00
Jonathan Kim ad1a3fc9c4 Fix Invalid Engine Offset Check
Engine offset that is the maximum number of engines is still valid
as offset enum 0 is occupied by blit copies so raise the limit by 1.

Change-Id: I6fcab106290e6647702efe297a4281861da4e0b8


[ROCm/ROCR-Runtime commit: fc8f3f9fd5]
2023-03-16 09:50:10 -04:00