Update the `hsa.h` header to use the gcc / clang `__BYTE_ORDER__`
macros where available to more accurately autodetect endianness for
the target.
Change-Id: I7312f3badcba9287a30eb14882b91e2a247acc5f
[ROCm/ROCR-Runtime commit: 4971150576]
This reverts commit 4c8a849772. This
change is required for the runtime to generate reliable core dump files,
but this feature has been disabled for now by
816b46868a. Until it is needed, revert
the ABI change in the trap handler to maintain compatibility with older
debugger.
Change-Id: I77a1562dc7962befe2bf88442df858e2d2b1c5ab
[ROCm/ROCR-Runtime commit: 6f828d8609]
If using hsakmt as a shared library
Change-Id: I66a1849a46bd7009813d49824d0d059e8a511038
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
[ROCm/ROCR-Runtime commit: 42581d4172]
This patch is to add APUs judgement in LargestVramBufferTest criteria.
Change-Id: Ic69093f8ebed8be0b1c58787e2a294d86fb49bb0
[ROCm/ROCR-Runtime commit: 808a4428b6]
This reverts commit 9aa39b0979.
This commit disables core dump feature. Apparently, gfx1101 SA1 waves
can not enter the trap handler because they receive an invalid
address. However, core dump at the debugger has been moved to rocm
6.2.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I7915caf58118658e5e7f435f91a0a6216d2fdb42
[ROCm/ROCR-Runtime commit: 5e3be9c28a]
On some systems, pthread_addr_setaffinity_np does not exist, so we need
to use pthread_setaffinity_np on thread after pthread_create
Provided by Julian Samaroo on github
https: //github.com/RadeonOpenCompute/ROCR-Runtime/pull/143
Change-Id: I4649f94333f2d7b0a5993b370a4bfc48d92acecb
[ROCm/ROCR-Runtime commit: 6333fdecf3]
When xnack is on shadder code in this test triggers gpu page fault that migrate
data from system ram to vram. Use svm range granularity to move all data from
system buffer to vram to reduce system ram pressure to avoid system ram oom for
systems that has less system ram.
Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
Change-Id: I219472210756be319491f7827f7209fe32726f81
[ROCm/ROCR-Runtime commit: 1a7162731e]
For "Intel Meteor lake Mobile", the cache info is not in sysfs,
That means /sys/devices/system/node/node%d/%s/cache is not exist,
but system working fine.
Change-Id: Ie7c04426791a84c2288ff21df093226828a5f629
Signed-off-by: Gang Ba <Gang.Ba@amd.com>
[ROCm/ROCR-Runtime commit: 4bf73f521b]
Add query to return flags for GPU agent memory properties and AQL
extensions.
Implement flag to determine that GPU agent is an APU
Change-Id: Ic04c51290b2b9763e14989c117f35a2e22297453
[ROCm/ROCR-Runtime commit: c86837d8d6]
When inspecting waves on architectures where SPI may not initialize TTMP
registers, the debugger cannot reliably know if the trap handler was
entered and if it saved valuable information in TTMP registers.
This patch uses the status.skip_export bit (unused by the compute
shaders) to indicate that it got executed before halting a wave.
This is done except for gfx940, where ttmp11[31] can be used (as long as
TTMP registers are always initialized by SPI for this architecture). It
could be possible to be more selective as architectures always
initializing TTMP registers do not require this step, but always doing
is makes maintenance simpler.
Change-Id: I314db6b37772f7daa8bd405e6662a86658d3f5e0
[ROCm/ROCR-Runtime commit: c5db063b2f]
Extracts and creates a core dump ELF file from a fault event, using
core dump front end.
Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: Ibbbe41b3d13dd3fcb90161e927d48c329cf513a9
[ROCm/ROCR-Runtime commit: 803e37ded5]
Member added to KFDVersion to report if KFD supports core dump
mechanism. This is done through hsaKmtRuntimeEnable API call while
the topology is being built. It also dictates if core dump will be
generated by either KFD or hsa-runtime.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I2e9d4166563402f78613d728446feb692c52d9d1
[ROCm/ROCR-Runtime commit: 54604654bd]
Core dump generation considers ulimit to generate the proper size
file.
Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: I61d991fc003b173f9075b66bff6a931447720695
[ROCm/ROCR-Runtime commit: 91f2a70817]
This API consists in one function to be called from a fault event at the
hsa-runtime to generate a core dump.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: Ib1b90d5beb13f93c4e8ebd21fd61705ebb12ca5d
[ROCm/ROCR-Runtime commit: 514b222368]
SegmentBuilder classes are used to get core dump data from the GPUs.
So far, it uses thunk API calls and smaps to collect all data from
the Hardware.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I2ad70ca5a951885181d3142653b186b0f6be739e
[ROCm/ROCR-Runtime commit: 1083d5c35f]
- fix logic for using HSA_TOOLS_LIB when rocprofiler-register support is enabled
- report tool load failure for rocprofiler-register
Change-Id: Ife23aa3e6ed19174376cd694764583b73f8976cd
[ROCm/ROCR-Runtime commit: 27eb0516bb]
The alternate scratch memory is used for dispatches that have a low
number of waves but relatively large wave size.
This allows us to keep the tmpring_size.bits.WAVES field of the main
scratch to full occupancy.
Change-Id: I32d240fac4b7d38200d1eebc1b0fdc8a823920d3
[ROCm/ROCR-Runtime commit: a7a3358067]
For devices where the CP FW supports asynchronous scratch reclaim, ROCr
is able to claw-back scratch memory that was assigned to an AQL queue.
With that ability, ROCr does not have to rely on using USO
(use-scratch-once) when assigning large amounts of memory to a queue.
If we reach a situation where we are running low on device memory, ROCr
will attempt to claw-back the scratch memory.
Change-Id: Iddf8ec84e37ab8b9fdc58bafbe2b61fe2acb6eb7
[ROCm/ROCR-Runtime commit: dca8f3a21d]
Separate the event handler and scratch handler portions of the code into
separate functions.
Change-Id: Ifdb7461e816b0f2d3c1c0a74d6f020b4d6fc736c
[ROCm/ROCR-Runtime commit: 64070a9acc]
Update queue structure to add members required for asynchronous reclaim
mechanism and dual-scratch. CP will set the AMD_QUEUE_CAPS_ASYNC_RECLAIM
bit on queue-connect to indicate whether the new features are supported.
The new members are ignored by previous versions of CP FW
Change-Id: Ic8e9ef41c5b1d04f09b43bc9b44b31527863d10f
[ROCm/ROCR-Runtime commit: 0344c8c0b6]
For gfx11, the trap_handler fails to recognize a trap id 3 and report
the exception to the debugger if the debugger is attached.
This is because the 2nd level trap handler looks for the DEBUG_ENABLED
bit in ttmp13 instead of ttmp11. This bit is set by the 1st level trap
handler and is part of the 1st/2nd level trap handler ABI.
Change-Id: Ib36361f53d9bcbbed52320d8c3a9ab2c0b28c7cd
[ROCm/ROCR-Runtime commit: 6916ce358a]
This reverts commit a8e34eaec8.
gfx1150/1151 is merged into mainline now.
Change-Id: Id179949318a37888c74abb5a8610d95bc2f22906
[ROCm/ROCR-Runtime commit: 991bbdcf24]
Skip Extended-scope memory pool as allocation is very close to
fine-grain/coarse-grain but with just different PTE flags.
Only test coarse grain on CPU agent other than the first CPU agent.
Stop bisecting the max size once we are withing 5% to total size for
these pool to speed this test on large memory pools.
Change-Id: I77d1b45a1752ef092dda7c7f27723ea0a292a612
[ROCm/ROCR-Runtime commit: cb5a29955b]
SDMA4.4 and SDMA5.2+ has increased it's available copy size to 2^30 bytes
represented by exponent as bits set in the COUNT field of the
linear copy.
Also note that the full 2^22 byte limit is available from SDMA4 onwards
as it has corrected the 0x3fffe0 HW limitation from SDMA3.
As copy limit has increase, this can change system performance
so provide env var HSA_ENABLE_SDMA_COPY_SIZE_OVERRIDE=0 to fall
back to the original 0x3fffe0 limit for debugging purposes.
Change-Id: I0fb6e5378f68e5b8a00ff559271691a943ee06ee
[ROCm/ROCR-Runtime commit: 81c64228e0]
To be able to trace memcpy asynchronously, both dst and src agents need to have profiling enabled and the api for enabling profiling was only enabling for gpu agents. CPU agents didn't have profiling enabled so the signal owner could not be known. hsa_amd_profiling_get_async_copy_time will fail with an HSA status error because it can't read the agent for the given signal.
Change-Id: Ie165e0e39b8fcd6992a55695b9ffcead10a8e812
[ROCm/ROCR-Runtime commit: ae1da390bd]
- Update CMakeLists.txt
- find_package for rocprofiler-register
- this is an optional package until rocprofiler-register is added to the CI
- define HSA_VERSION_{MAJOR,MINOR,PATCH} ppdefs
- Update runtime.cpp
- include <rocprofiler-register/rocprofiler-register.h>
- if rocprofiler-register succeeds, do not support v1 unless explicitly requested
Change-Id: I8f48bbf3f6b52fb91ddade2f198491a1256035fe
[ROCm/ROCR-Runtime commit: f9cf1852e5]
Remove override that forces ROCr image blit source and ROCr test to use
code object version 4 now that mainline has been updated to version 5.
Change-Id: I94681e86835c0e382475306ead4cd4132a2ee78f
[ROCm/ROCR-Runtime commit: 2f847cf05f]
Add handler to handle HW exception events reported by underlying
drivers. These events are generally caused by GPU resets and need the
application to abort.
As an improvement, in the future, we can provide additional information
about the exception (e.g mode-reset level)
Change-Id: If3fb5f19f9fce181a9d3b5e34a5506725856e7b0
[ROCm/ROCR-Runtime commit: 750212e50e]
Add new structures for HW Exception events and copy data from KFD to
expose to upper layers.
Change-Id: Icd5eb98997c47620e3b86277ab6d3abb7ed7d56f
[ROCm/ROCR-Runtime commit: 01ff2f7934]
AqlPacket::string should check the packet type is in range of the array
used to print its name.
Change-Id: I33dabbd941d086929526d842c9dbc0bd7305acd5
[ROCm/ROCR-Runtime commit: 7955fb01ec]
An AQL packet header field is stored using an atomic release, and needs
to be read using atomic acquire if it may be written by another thread.
Change-Id: I1d75587fd93f9c6216deebffc9a627b404a7e749
[ROCm/ROCR-Runtime commit: 395ad3b77b]
Define AMD_AQL_FORMAT_INTERCEPT_MARKER AMD vendor AQL packet. Add
support to intercept queue to invoke a callback for these packets.
Change-Id: Ia58d5fe2171f563632b4edd6343e02585f49d149
[ROCm/ROCR-Runtime commit: 23b4ce501d]