Gráfico de commits

10 Commits

Autor SHA1 Mensaje Fecha
Yifan Zhang ccd91bcd19 coredump: call KFD_IOC_DBG_TRAP_DISABLE in error path.
KFD assumes kfd_dbg_trap_enable/disable be called in pair, or there will
be kfd_process ref leak in KFD.
2025-05-27 13:54:00 +08:00
Aaron Liu 1b79caa214 rocr/dtif: replace hsakmt interfaces with HSAKMT_CALL(...)
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
2025-05-13 16:44:31 -04:00
David Yat Sin d031af9eb5 rocr: Check RLIMIT_CORE before generating coredump
Check for RLIMIT_CORE before collecting data for coredump. If the
current limit is 0, then we can return early without spending time
collecting coredump data.
2025-03-04 10:29:34 -05:00
Lancelot SIX 3475a45137 rocr/amd_core_dump: Fix "arithmetic on a pointer to void"
A recent patch introduced a build failure when building with Clang:

    [ 65%] Building CXX object runtime/hsa-runtime/CMakeFiles/hsa-runtime64.dir/libamdhsacode/amd_core_dump.cpp.o
    […]/runtime/hsa-runtime/libamdhsacode/amd_core_dump.cpp:271:29: error: arithmetic on a pointer to void
      271 |       read = pread(fd_, buf + done, buf_size - done,
          |                         ~~~ ^
    1 error generated.

This patch fixes this by making sure the "void *" pointer is converting
to "char *" before doing arithmetic on it.

Change-Id: Ib1663ed30abce76e05f06d042975eccd7d729823
2024-08-21 17:19:28 -04:00
Lancelot Six 3646064a0e coredump: Print diagnostic in stderr when errors are detected
This patch adds output (to stderr) to indicate step in the core dump
creation failed to improve debuggability.

Change-Id: I349692e278c2d744136d7fba7f7c2e5a7ada0c06
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
2024-08-19 13:20:20 -04:00
Lancelot Six 3e0d3d6d61 coredump: Improve error handling when reading VRAM
It is possible for the runtime to receive an interrupt while trying to
access VRAM data using /proc/self/mem.  In such case, pread(2) would
return -1 and set errno to -EINTR.  This is not an error case, the
pread(2) call just need to be restarted, however current implementation
would tread it as an error.

This patch changes the the implementation to correctly retry on EINTR.
While at it, this patch also handles cases where pread(2) reads less
data than originally requested.

Change-Id: I6a72fc5eda4afd90319f0d24b35c9eac6d1ff41c
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
2024-08-19 12:20:22 -04:00
David Yat Sin 5b28a1bc17 Fix compile error on certain gcc versions
Change-Id: I8a4fab76d1dcc576eb7706ab45fc786c0cab274a
2024-02-13 15:25:34 -05:00
Alex Sierra 91f2a70817 core dump: ulimit check mechanism added
Core dump generation considers ulimit to generate the proper size
file.

Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: I61d991fc003b173f9075b66bff6a931447720695
2023-12-05 23:19:14 -05:00
Alex Sierra 514b222368 core dump: Front end core dump API
This API consists in one function to be called from a fault event at the
hsa-runtime to generate a core dump.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: Ib1b90d5beb13f93c4e8ebd21fd61705ebb12ca5d
2023-12-05 23:19:14 -05:00
Alex Sierra 1083d5c35f core dump: SegmentBuilder classes added
SegmentBuilder classes are used to get core dump data from the GPUs.
So far, it uses thunk API calls and smaps to collect all data from
the Hardware.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I2ad70ca5a951885181d3142653b186b0f6be739e
2023-12-05 23:19:14 -05:00