93 Коммитов

Автор SHA1 Сообщение Дата
cfreeamd 5172701708 rocr: Correct gpu dumped core contents (#2851)
Includes several tests (rocrtst) for this capability.
2026-01-30 09:38:09 -08:00
pghoshamd d2a1fc945e SWDEV-569319 Fix dangling reference warning (#2509)
* SWDEV-569319 Fix dangling reference warning

* fix nullptr warning

* use emplace

* return regular pointer
2026-01-13 15:39:03 -06:00
pghoshamd 637b0d71f0 SWDEV-569319 Replace ScopedAcquire with stdcpp wrappers (#2146)
* SWDEV-569319 Replace ScopedAcquire with stdcpp wrappers

* Remove KernelMutex and KernelSharedMutex abstractions with std::mutex and std::shared_mutex

* Replaced unique_locks with lock_guards

* More changes

* Replace new and deletes with smart pointers

* Replaced some more with shared ptrs

* Replacements with smart pointers - pt 2

* missed change
2026-01-06 10:59:34 -05:00
Mario Limonciello bc5d48e76c Run pre-commit's whitespace related hooks on projects/rocr-runtime (#2130)
* Run pre-commit's whitespace related hooks on projects/rocr-runtime

In order for pre-commit to be useful, everything needs to meet a common
baseline.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

* Add missing semicolon which would block compilation on big endian CPUs

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

---------

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-12-08 07:56:50 -06:00
cfreeamd 24c2a84e3f rocr: GPU core file location support (#1732)
* rocr: WIP Support dump of GPU core file

* WIP new core dump tests compile

* WIP: anony namespaces, test updates, progress

Added disabled Fault test. Other non-disabled coredump tests don't work.

* WIP: address code review feedback

* WIP: gpu core dump rocrtst works; combined

* WIP: remove rocrtst changes for this commit
2025-11-20 18:50:51 -08:00
David Yat Sin 48cb61f378 rocr: Separate Linux coredump implementation (#1588)
Remove libamdhsacode/win32/elf.h due to license restrictions.

Separate Linux coredump implementation because we do not have the ELF
definitions on Windows.

Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
2025-11-07 14:52:08 -05:00
German Andryeyev 913743d433 Add windows build support into ROCr (#912)
Make sure ROCR can be compiled under windows. Extra setup for the windows build environment is required. The change should not have any functional changes under Linux.
2025-09-19 10:10:17 -04:00
systems-assistant[bot] f1fabcfd64 rocr: Error Handling Issues (#264)
* rocr: Fix Incorrect Assertion Check

The wrong variable is used in the assertion statement, should be error
checking for the value of paramEndLoc after it is modified by the call
to find().

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* rocr: Fix Potential Undefined Behaviour

In the event that the SvmProfileControl destructor is called and
event == -1 is true then the call to close(event) is effectively
close(-1) which is undefined behaviour. This has been changed to only
call close() on valid file descriptors.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* rocr: Add Error Check on Bytes Read

In the case that there is an incomplete read the call to copyTo() will
now return an error.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* rocr: Fix Exception Error

Destructors are implicitly marked with noexcept being true by default
so if its not explicitly marked false in the destructor or the
functions it calls, any thrown exceptions will cause the program to
crash.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

---------

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Co-authored-by: Sunday Clement <Sunday.Clement@amd.com>
2025-09-16 09:43:45 -04:00
Sunday Clement 1eaee1649a rocr: Fix Unintended Sign Extension
ehdr->e_shentshize and ehdr->e_shnum are both 16-bit unsigned integers
and so their types get implicitly promoted to signed int automatically
during the multiplication, they must be explicitly cast into a larger
unsigned type, otherwise if the signed product is large enough the
value is sign extended resulting in incorrect values.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>


[ROCm/ROCR-Runtime commit: d00ca2e9b7]
2025-06-09 15:16:10 -04:00
Alysa Liu d1c3b7262d rocr: Add proper file descriptor cleanup
Ensure file descriptor 'in' is properly closed in error cases
when calling _lseek() during readFrom() operations.
Fix potential resource leak when errors occur during file operations.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>


[ROCm/ROCR-Runtime commit: 167602edfb]
2025-06-04 22:37:21 -04:00
Alysa Liu 65f5ce6f0a rocr: Fixed inefficient copy operations
Changed variable assignments to use std::move() where appropriate.
Changed function headers to pass string arguments by reference where appropriate.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>


[ROCm/ROCR-Runtime commit: ae6851dbb4]
2025-06-02 11:18:36 -04:00
Alysa Liu 88dd451c64 rocr: Fixed inefficient copy operations
Changed variable assignments to use std::move() where appropriate

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>


[ROCm/ROCR-Runtime commit: 369d89ade3]
2025-06-02 11:18:36 -04:00
Alysa Liu 296e60d882 rocr: Add check for 'value' pointer
Replaces assertion check assert(value) with explicit null pointer check
Returns HSA_STATUS_ERROR_INVALID_ARGUMENT on null valuesrocr: Add check for 'value' pointer

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>


[ROCm/ROCR-Runtime commit: 625425326d]
2025-05-27 12:18:04 -04:00
Alysa Liu 8cbabdbbe3 rocr: Unchecked return value as arg
v1: Add value pointer validation before
dereferencing in GetInfo method for MODULE_NAME case.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>


[ROCm/ROCR-Runtime commit: f1f34da4f6]
2025-05-27 12:18:04 -04:00
Yifan Zhang 3ab8b5a98b coredump: call KFD_IOC_DBG_TRAP_DISABLE in error path.
KFD assumes kfd_dbg_trap_enable/disable be called in pair, or there will
be kfd_process ref leak in KFD.


[ROCm/ROCR-Runtime commit: ccd91bcd19]
2025-05-27 13:54:00 +08:00
Aaron Liu 6cf184a0d4 rocr/dtif: replace hsakmt interfaces with HSAKMT_CALL(...)
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 1b79caa214]
2025-05-13 16:44:31 -04:00
David Yat Sin 35faa9783a rocr: Check RLIMIT_CORE before generating coredump
Check for RLIMIT_CORE before collecting data for coredump. If the
current limit is 0, then we can return early without spending time
collecting coredump data.


[ROCm/ROCR-Runtime commit: d031af9eb5]
2025-03-04 10:29:34 -05:00
David Yat Sin 1474a6c774 rocr: Remove gfx940 and gfx941 support
[ROCm/ROCR-Runtime commit: 13c591d250]
2025-02-19 12:16:24 -05:00
Min Zhou ee1ff92026 rocr: delete duplicated conditional expression
Change-Id: Idc8b1a8ca2975f33191a448f03cabf3fc4f8f8a6


[ROCm/ROCR-Runtime commit: a82f2f3134]
2025-01-28 10:48:44 -05:00
David Yat Sin d0ae8b2eb5 rocr: Add support for gfx950
<squashed with patch for gfx950 generic targets>

Signed-off-by: Chris Freehill <Chris.Freehill@amd.com>

Change-Id: Ifec6d93cf46c7fbf736c6572882299e279260af6


[ROCm/ROCR-Runtime commit: dab8f2fc65]
2025-01-26 13:04:58 -05:00
Apurv Mishra 23ab95b5f2 rocr: multiple uninitialized and unused variables
Minor modifications to multiple source and header
files based on Coverity report

Change-Id: I4a73d0f56640983c4d5124e13c8c280245cca672
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>


[ROCm/ROCR-Runtime commit: 699d0140be]
2024-12-18 10:11:13 -05:00
Konstantin Zhuravlyov bee079fc24 loader: add gfx9-4-generic support
Change-Id: Icb148f7a78a4ce0fc661e35d0df605e05db2de3d


[ROCm/ROCR-Runtime commit: 4c7a9a0f67]
2024-11-14 12:47:46 -05:00
Konstantin Zhuravlyov 5133b16637 loader: add gfx12-generic support
Change-Id: I0bf5d48ec357278bdb7a9c4eae61a7b7995411f0


[ROCm/ROCR-Runtime commit: ec3d4aa5e9]
2024-11-11 16:27:47 -05:00
Konstantin Zhuravlyov a384ada964 loader: add gfx1153 support
Change-Id: Ie3f0ecf1c6631d95cbff5e14ddc48e751f4c356d


[ROCm/ROCR-Runtime commit: cf9c2efbbd]
2024-11-11 16:27:39 -05:00
Konstantin Zhuravlyov 048a6dc0bd loader/nfc: reorder cases when switching on targets, specific first, generic second
Change-Id: I47f38c1691b9b6ff589f7ff445143997b0801dc6


[ROCm/ROCR-Runtime commit: 7d9a51e22a]
2024-11-11 16:27:34 -05:00
Konstantin Zhuravlyov 68f7fb4fa7 loader: add missing support for gfx700
Change-Id: Ia08e93b0e2d300a183a7a5fb92604cd801b2d52a


[ROCm/ROCR-Runtime commit: 4344f012b6]
2024-11-11 16:27:27 -05:00
Lancelot SIX 808e8e6900 rocr/amd_core_dump: Fix "arithmetic on a pointer to void"
A recent patch introduced a build failure when building with Clang:

    [ 65%] Building CXX object runtime/hsa-runtime/CMakeFiles/hsa-runtime64.dir/libamdhsacode/amd_core_dump.cpp.o
    […]/runtime/hsa-runtime/libamdhsacode/amd_core_dump.cpp:271:29: error: arithmetic on a pointer to void
      271 |       read = pread(fd_, buf + done, buf_size - done,
          |                         ~~~ ^
    1 error generated.

This patch fixes this by making sure the "void *" pointer is converting
to "char *" before doing arithmetic on it.

Change-Id: Ib1663ed30abce76e05f06d042975eccd7d729823


[ROCm/ROCR-Runtime commit: 3475a45137]
2024-08-21 17:19:28 -04:00
Lancelot Six 84135d4f49 coredump: Print diagnostic in stderr when errors are detected
This patch adds output (to stderr) to indicate step in the core dump
creation failed to improve debuggability.

Change-Id: I349692e278c2d744136d7fba7f7c2e5a7ada0c06
Signed-off-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 3646064a0e]
2024-08-19 13:20:20 -04:00
Lancelot Six 96545e914b coredump: Improve error handling when reading VRAM
It is possible for the runtime to receive an interrupt while trying to
access VRAM data using /proc/self/mem.  In such case, pread(2) would
return -1 and set errno to -EINTR.  This is not an error case, the
pread(2) call just need to be restarted, however current implementation
would tread it as an error.

This patch changes the the implementation to correctly retry on EINTR.
While at it, this patch also handles cases where pread(2) reads less
data than originally requested.

Change-Id: I6a72fc5eda4afd90319f0d24b35c9eac6d1ff41c
Signed-off-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 3e0d3d6d61]
2024-08-19 12:20:22 -04:00
Yifan Zhang 491275f838 Add support for GC 11.5.2
Change-Id: Iad8604881dc66108933ac2155fef3b74bca9ac3f
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>


[ROCm/ROCR-Runtime commit: 71494a920b]
2024-06-25 12:50:03 -05:00
Sreekant Somasekharan 94950deac7 Initial GFX1201 changes.
Add target gfx1201 to several files.

Change-Id: I5cae7dba00ed58f8fbfa6e7147275bd7d5feaed0
Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>


[ROCm/ROCR-Runtime commit: 24463635f9]
2024-06-25 12:27:09 -05:00
David Belanger adb5e2cabf Initial GFX12 changes.
Add target gfx1200 to several files.
Add cases for GFX12 in a few switch statements.

Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: Ib90032f5b9d5a3306060f13a43d970108a1399df
Signed-off-by: Chris Freehill <cfreehil@amd.com>


[ROCm/ROCR-Runtime commit: 2f14acd9c1]
2024-06-25 12:27:09 -05:00
Konstantin Zhuravlyov ec66509986 Rename existing relocation types to legacy/v1 (NFC)
Change-Id: Ided7f656c34131b8067a19c0d3b2955fc8823628


[ROCm/ROCR-Runtime commit: b2c32ad6cb]
2024-03-26 18:46:50 -04:00
pvanhout 8e43aaab04 [libamdhsacode] Support COV6/Generic Targets
Change-Id: I4680577eb56dc436fbc134b169f172dd476bff37


[ROCm/ROCR-Runtime commit: a93c18dc90]
2024-03-12 07:37:32 -04:00
David Yat Sin 7a6c962b36 Fix compile error on certain gcc versions
Change-Id: I8a4fab76d1dcc576eb7706ab45fc786c0cab274a


[ROCm/ROCR-Runtime commit: 5b28a1bc17]
2024-02-13 15:25:34 -05:00
Alex Sierra ba0e2d3664 core dump: ulimit check mechanism added
Core dump generation considers ulimit to generate the proper size
file.

Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: I61d991fc003b173f9075b66bff6a931447720695


[ROCm/ROCR-Runtime commit: 91f2a70817]
2023-12-05 23:19:14 -05:00
Alex Sierra f4f6a49cbd core dump: Front end core dump API
This API consists in one function to be called from a fault event at the
hsa-runtime to generate a core dump.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: Ib1b90d5beb13f93c4e8ebd21fd61705ebb12ca5d


[ROCm/ROCR-Runtime commit: 514b222368]
2023-12-05 23:19:14 -05:00
Alex Sierra 663e42663b core dump: SegmentBuilder classes added
SegmentBuilder classes are used to get core dump data from the GPUs.
So far, it uses thunk API calls and smaps to collect all data from
the Hardware.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I2ad70ca5a951885181d3142653b186b0f6be739e


[ROCm/ROCR-Runtime commit: 1083d5c35f]
2023-12-05 23:19:14 -05:00
Lang Yu 43ae931ad5 Revert "Revert "Add support for GC 11.5.0 and 11.5.1""
This reverts commit a8e34eaec8.

gfx1150/1151 is merged into mainline now.

Change-Id: Id179949318a37888c74abb5a8610d95bc2f22906


[ROCm/ROCR-Runtime commit: 991bbdcf24]
2023-12-04 15:03:31 +00:00
David Yat Sin a8e34eaec8 Revert "Add support for GC 11.5.0 and 11.5.1"
Reverting this as current mainline compiler branch does not support
gfx1150/gfx1151 yet. Will bring back later.

This reverts commit 75ce1848cf.

Change-Id: I31ff4fb2d5817538094a7ffaeba96dd6a7d660c7


[ROCm/ROCR-Runtime commit: ebc51dd0eb]
2023-07-26 15:03:54 +00:00
Lang Yu 75ce1848cf Add support for GC 11.5.0 and 11.5.1
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Change-Id: I3c4116e78a5c1ddac2389f5fece57485bdb17f68


[ROCm/ROCR-Runtime commit: e877840197]
2023-07-22 16:06:22 +08:00
Konstantin Zhuravlyov e126b5a054 Cache referenced symbol table when pulling data in relocation section
Change-Id: I6ef21cedde1aca6fd1ec5e5d5634563f030eaab8


[ROCm/ROCR-Runtime commit: 8a6edb07d9]
2023-06-21 16:35:45 -04:00
raghavmedicherla 2758da98cd [hsa-runtime] Add support to hsa-runtime to find symbols from ".dynsym" section.
Earlier, hsa-runtime was unable to find symbols from a stripped ELF-image becasue
no support to find symbols from ".dynsym" section.

Looking for symbols in .dynsym is enabled by LOADER_USE_DYNSYM=1
environment variable

Change-Id: I4f0e8dd0eb053a6066d4d49b670c52e51149531a


[ROCm/ROCR-Runtime commit: 4142a77375]
2023-06-16 14:40:50 -04:00
David Yat Sin 3345ada378 Adding gfx941 and gfx942
Adding support for gfx941 and gfx942 ISAs.
gfx940 ISA will use sc0:1 sc1:1 on load/store operations
gfx942 ISA will use default load/store operations

Change-Id: If1efbef86f59e2cf2d48fe359cd4166405a0a579


[ROCm/ROCR-Runtime commit: 41f6d0426d]
2023-05-23 11:13:16 -04:00
David Yat Sin e1ded285a9 Removing invalid gfx entries
Change-Id: I1a9a9a064f5f65ecc3e124c5dd7d6baf6b5ccb5c


[ROCm/ROCR-Runtime commit: f0000da7b3]
2023-05-12 11:59:27 -04:00
Mike Li 547d2aa3c8 Add gfx940 to AmdHsaCode
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
Change-Id: Ib4f7c801c3d3bac9a04c880c5bf86b72bfa3404f


[ROCm/ROCR-Runtime commit: de4d1ce424]
2023-04-27 16:09:26 -04:00
Mike Li fe9b01e916 Added gfx940 ISA
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
Change-Id: Icb1830fe186abc69fe7ee709b7f12b882cab9e87


[ROCm/ROCR-Runtime commit: bd98a1e5bf]
2023-04-27 16:08:58 -04:00
Alex Sierra bd8c4079da use mkstemp instead tempnam for temp file
tempnam has been marked as obsolete.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: Ie64d9a351bf386da00a96ceff059f685e11f2cca


[ROCm/ROCR-Runtime commit: e82025bffa]
2023-04-17 15:38:59 -04:00
Konstantin Zhuravlyov d861267d20 Loader/NFC: Factor out mach information into the struct
Change-Id: I9304c96336c434570bd5da92cd197ee764945907


[ROCm/ROCR-Runtime commit: 8043fe9ee0]
2023-03-07 14:41:03 -05:00
raghavmedicherla 2b666f57fa [hsa-runtime] Modify elfsection checks in amd_elf_image class
Modified If condition checks in GElfImage::pullElf() of amd_elf_image.cpp to
 check using section types instead of a string check.

Change-Id: I1ab92f0a9118fb2382652a1cc900a3150cbee2da


[ROCm/ROCR-Runtime commit: 5727a10a1b]
2022-12-05 14:42:02 -05:00