Commit-Graf

2930 Incheckningar

Upphovsman SHA1 Meddelande Datum
David Yat Sin dc9bce3b9b Force t1_ update when profiling is enabled
Fixes issue where t1_ counters may not be updated when doing dispatch
profiling, causing a divide by 0.

Change-Id: I91060ac3f9fd2183d277e6e7cd810398a453a87f


[ROCm/ROCR-Runtime commit: 3d1563ee68]
2024-01-16 21:29:39 +00:00
David Yat Sin bc492274e7 Increase min KFD version for Virtual mem support
KFD had some fixes for handling of virtual memory APIs. These fixes are
included in interface version 1.15.

Change-Id: Ie701eccf6e032f9ec0a1f4e8a43718964eebddc6


[ROCm/ROCR-Runtime commit: d16c6db2ee]
2024-01-16 21:29:39 +00:00
Joseph Huber 298e6cc495 Improve endianness check
Update the `hsa.h` header to use the gcc / clang `__BYTE_ORDER__`
macros where available to more accurately autodetect endianness for
the target.

Change-Id: I7312f3badcba9287a30eb14882b91e2a247acc5f


[ROCm/ROCR-Runtime commit: 4971150576]
2024-01-16 21:29:39 +00:00
Lancelot SIX 9317d0fbc0 Revert "trap_handler: Set status.skip_export when halting a wave"
This reverts commit 4c8a849772.  This
change is required for the runtime to generate reliable core dump files,
but this feature has been disabled for now by
816b46868a.  Until it is needed, revert
the ABI change in the trap handler to maintain compatibility with older
debugger.

Change-Id: I77a1562dc7962befe2bf88442df858e2d2b1c5ab


[ROCm/ROCR-Runtime commit: 6f828d8609]
2024-01-16 15:55:59 +00:00
Ranjith Ramakrishnan ed4861e951 Use relative path rather than hard coded path in package config file
Change-Id: Ia35fadeead69f84f4f4d32ab0c04f2f391aba4f4


[ROCm/ROCR-Runtime commit: a7d8b1c287]
2024-01-07 20:51:59 -08:00
Jeremy Newton b2b4d5e7b5 Fix missing global symbol
If using hsakmt as a shared library

Change-Id: I66a1849a46bd7009813d49824d0d059e8a511038
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>


[ROCm/ROCR-Runtime commit: 42581d4172]
2024-01-04 11:14:39 -05:00
Yifan Zhang 29820a0887 kfdtest: add APUs judgement in LargestVramBufferTest criteria
This patch is to add APUs judgement in LargestVramBufferTest criteria.

Change-Id: Ic69093f8ebed8be0b1c58787e2a294d86fb49bb0


[ROCm/ROCR-Runtime commit: 808a4428b6]
2023-12-28 17:17:31 +08:00
Yifan Zhang f5125d542a kfdtest: finetune MMBench subtest criteria
Change the proportion to 6/10 on APUs.

Change-Id: I3576cb23d0f14ff6d576a5db4bdeef9446aa10d2
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>


[ROCm/ROCR-Runtime commit: 92c0015787]
2023-12-28 17:12:44 +08:00
Yifan Zhang 1c0d18bce6 kfdtest: finetune LargestVramBuffer subtest criteria
Change the proportion to 3/5 on APUs.

Change-Id: I0dc37fb1c309605811551b88259473603c81f9ae
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>


[ROCm/ROCR-Runtime commit: 72324444cc]
2023-12-27 16:23:17 +08:00
Ruili Ji ff2c3dfcca To fix sdma segment fault for error address
pad_size address shall start from command_addr not
(command_addr + total_command_size)

Change-Id: I3d8491986caf2d4d5dc41b1d90286c21e7c0a457


[ROCm/ROCR-Runtime commit: 4b69351394]
2023-12-25 09:31:13 +08:00
Alex Sierra 816b46868a Revert "core dump: Generates a core dump from a fault event"
This reverts commit 9aa39b0979.
This commit disables core dump feature. Apparently, gfx1101 SA1 waves
can not enter the trap handler because they receive an invalid
address. However, core dump at the debugger has been moved to rocm
6.2.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I7915caf58118658e5e7f435f91a0a6216d2fdb42


[ROCm/ROCR-Runtime commit: 5e3be9c28a]
2023-12-18 17:30:13 -06:00
David Yat Sin dcd5f16de0 Use pthread_setaffinity_np
On some systems, pthread_addr_setaffinity_np does not exist, so we need
to use pthread_setaffinity_np on thread after pthread_create

Provided by Julian Samaroo on github

https: //github.com/RadeonOpenCompute/ROCR-Runtime/pull/143
Change-Id: I4649f94333f2d7b0a5993b370a4bfc48d92acecb


[ROCm/ROCR-Runtime commit: 6333fdecf3]
2023-12-18 17:41:49 -05:00
David Yat Sin 9481b6290a Fix README for invalid command
`-DCMAKE_INSTALL_PATH` is not valid,use `-DCMAKE_INSTALL_PREFIX` instead

https://github.com/RadeonOpenCompute/ROCR-Runtime/pull/171/

Suggested-by: fjh1997 on github
Change-Id: Ibb85da7fe755b662fa9a836d6fbe3394d34a0337


[ROCm/ROCR-Runtime commit: 9b2ed66609]
2023-12-18 09:15:05 -05:00
Xiaogang Chen 2a1b821740 kfdtest: use svm range granularity in KFDSVMEvictTest.QueueTest/1
When xnack is on shadder code in this test triggers gpu page fault that migrate
data from system ram to vram. Use svm range granularity to move all data from
system buffer to vram to reduce system ram pressure to avoid system ram oom for
systems that has less system ram.

Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
Change-Id: I219472210756be319491f7827f7209fe32726f81


[ROCm/ROCR-Runtime commit: 1a7162731e]
2023-12-15 13:27:22 -06:00
David Yat Sin bb5779b844 Temporarily disabling Queue_Validation_InvalidWorkGroupSize
Change-Id: I88c5900151b9f572f956cd7b428119b614331be9


[ROCm/ROCR-Runtime commit: c81c18bcc5]
2023-12-14 16:46:19 -05:00
gaba dcf8b5fd6c libhsakmt: Fix CPU cache issue
For "Intel Meteor lake Mobile", the cache info is not in sysfs,
    That means /sys/devices/system/node/node%d/%s/cache is not exist,
    but system working fine.

Change-Id: Ie7c04426791a84c2288ff21df093226828a5f629
Signed-off-by: Gang Ba <Gang.Ba@amd.com>


[ROCm/ROCR-Runtime commit: 4bf73f521b]
2023-12-08 15:29:19 -05:00
David Yat Sin ff3818c725 Add query for agent memory and aql ext properties
Add query to return flags for GPU agent memory properties and AQL
extensions.

Implement flag to determine that GPU agent is an APU

Change-Id: Ic04c51290b2b9763e14989c117f35a2e22297453


[ROCm/ROCR-Runtime commit: c86837d8d6]
2023-12-07 14:41:37 -05:00
Lancelot SIX 4c8a849772 trap_handler: Set status.skip_export when halting a wave
When inspecting waves on architectures where SPI may not initialize TTMP
registers, the debugger cannot reliably know if the trap handler was
entered and if it saved valuable information in TTMP registers.

This patch uses the status.skip_export bit (unused by the compute
shaders) to indicate that it got executed before halting a wave.
This is done except for gfx940, where ttmp11[31] can be used (as long as
TTMP registers are always initialized by SPI for this architecture).  It
could be possible to be more selective as architectures always
initializing TTMP registers do not require this step, but always doing
is makes maintenance simpler.

Change-Id: I314db6b37772f7daa8bd405e6662a86658d3f5e0


[ROCm/ROCR-Runtime commit: c5db063b2f]
2023-12-06 21:20:03 -05:00
David Yat Sin ff72129092 Add queries for HSA Ext interface version
Change-Id: I26860fb1364cd3a33cdc9b284ac807b2702bb241


[ROCm/ROCR-Runtime commit: ed1b0b9b1a]
2023-12-06 13:58:52 -05:00
Alex Sierra 9aa39b0979 core dump: Generates a core dump from a fault event
Extracts and creates a core dump ELF file from a fault event, using
core dump front end.

Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: Ibbbe41b3d13dd3fcb90161e927d48c329cf513a9


[ROCm/ROCR-Runtime commit: 803e37ded5]
2023-12-05 23:19:14 -05:00
Alex Sierra 4370aa1364 reports KFD core dump support through hsakmt API
Member added to KFDVersion to report if KFD supports core dump
mechanism. This is done through hsaKmtRuntimeEnable API call while
the topology is being built. It also dictates if core dump will be
generated by either KFD or hsa-runtime.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I2e9d4166563402f78613d728446feb692c52d9d1


[ROCm/ROCR-Runtime commit: 54604654bd]
2023-12-05 23:19:14 -05:00
Alex Sierra ba0e2d3664 core dump: ulimit check mechanism added
Core dump generation considers ulimit to generate the proper size
file.

Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: I61d991fc003b173f9075b66bff6a931447720695


[ROCm/ROCR-Runtime commit: 91f2a70817]
2023-12-05 23:19:14 -05:00
Alex Sierra f4f6a49cbd core dump: Front end core dump API
This API consists in one function to be called from a fault event at the
hsa-runtime to generate a core dump.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: Ib1b90d5beb13f93c4e8ebd21fd61705ebb12ca5d


[ROCm/ROCR-Runtime commit: 514b222368]
2023-12-05 23:19:14 -05:00
Alex Sierra 663e42663b core dump: SegmentBuilder classes added
SegmentBuilder classes are used to get core dump data from the GPUs.
So far, it uses thunk API calls and smaps to collect all data from
the Hardware.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I2ad70ca5a951885181d3142653b186b0f6be739e


[ROCm/ROCR-Runtime commit: 1083d5c35f]
2023-12-05 23:19:14 -05:00
Giovanni LB 952f750a77 Adding coordinate query to aqlprofile
Change-Id: I9f2fee62a24cf2a4784ba9e8c813b7b7296d034b


[ROCm/ROCR-Runtime commit: 71bc875ccd]
2023-12-05 13:25:30 -05:00
Giovanni LB a633cba5ea : Adding ATT API extension to aqlprofile
Change-Id: Ic511cf871d5d98638d7041ca277f945ae8ced3a5


[ROCm/ROCR-Runtime commit: e8920cacc8]
2023-12-05 13:25:10 -05:00
Jonathan R. Madsen a66287197e rocprofiler-register updates
- fix logic for using HSA_TOOLS_LIB when rocprofiler-register support is enabled
- report tool load failure for rocprofiler-register

Change-Id: Ife23aa3e6ed19174376cd694764583b73f8976cd


[ROCm/ROCR-Runtime commit: 27eb0516bb]
2023-12-04 11:44:58 -06:00
David Yat Sin 38ce852b12 Add RISC-V support
Patch provided by user Xeonacid via github:
https://github.com/RadeonOpenCompute/ROCR-Runtime/pull/172/

Change-Id: I5f9086b536383093e7995b9cfdc19dab213f0265


[ROCm/ROCR-Runtime commit: 251601b20b]
2023-12-04 15:05:22 +00:00
David Yat Sin d81bf9cd57 Use CPU_SET_S instead of CPU_SET
Fix incorrect use of CPU_SET on variable size cpu_set_t

Suggested by Christopher E. Moore on github
https://github.com/RadeonOpenCompute/ROCR-Runtime/issues/130

Change-Id: I710b56683ba07c08dcd83c851bf72e4f127a0ad4


[ROCm/ROCR-Runtime commit: f07b8f2250]
2023-12-04 15:05:22 +00:00
Giovanni LB ecd768797e Extending AQLprofile API to include counter dimensions
Change-Id: If59489a085959f3f765a30e3e445df5151e30350


[ROCm/ROCR-Runtime commit: e0c6c5e5bf]
2023-12-04 15:05:22 +00:00
David Yat Sin 6140d8a66d Implement alternate scratch
The alternate scratch memory is used for dispatches that have a low
number of waves but relatively large wave size.
This allows us to keep the tmpring_size.bits.WAVES field of the main
scratch to full occupancy.

Change-Id: I32d240fac4b7d38200d1eebc1b0fdc8a823920d3


[ROCm/ROCR-Runtime commit: a7a3358067]
2023-12-04 15:05:22 +00:00
David Yat Sin 66b9fdc2d6 Implement async scratch reclaim
For devices where the CP FW supports asynchronous scratch reclaim, ROCr
is able to claw-back scratch memory that was assigned to an AQL queue.
With that ability, ROCr does not have to rely on using USO
(use-scratch-once) when assigning large amounts of memory to a queue.
If we reach a situation where we are running low on device memory, ROCr
will attempt to claw-back the scratch memory.

Change-Id: Iddf8ec84e37ab8b9fdc58bafbe2b61fe2acb6eb7


[ROCm/ROCR-Runtime commit: dca8f3a21d]
2023-12-04 15:05:22 +00:00
David Yat Sin fa600434ee Refactor scratch handler function
Separate the event handler and scratch handler portions of the code into
separate functions.

Change-Id: Ifdb7461e816b0f2d3c1c0a74d6f020b4d6fc736c


[ROCm/ROCR-Runtime commit: 64070a9acc]
2023-12-04 15:05:22 +00:00
David Yat Sin b1942bff27 Re-arrange and rename scratch elements that are used with main scratch
Change-Id: I4c1ff8cf4121a06b586fe49c70400226506bf95e


[ROCm/ROCR-Runtime commit: fa317f8c41]
2023-12-04 15:05:22 +00:00
David Yat Sin 03e87e3d66 Update queue structure to support async reclaim
Update queue structure to add members required for asynchronous reclaim
mechanism and dual-scratch. CP will set the AMD_QUEUE_CAPS_ASYNC_RECLAIM
bit on queue-connect to indicate whether the new features are supported.

The new members are ignored by previous versions of CP FW

Change-Id: Ic8e9ef41c5b1d04f09b43bc9b44b31527863d10f


[ROCm/ROCR-Runtime commit: 0344c8c0b6]
2023-12-04 15:05:22 +00:00
Shweta Khatri 43f1ee386f Revert "Restore default code object version usage for ROCr and ROCr Test"
This reverts commit 6ef7fcedd1290b59190f81df1d25142ecb05d282.

Change-Id: Icc0300c25a89fcb99287d013863a00ace7e12129


[ROCm/ROCR-Runtime commit: acf9e95027]
2023-12-04 15:03:31 +00:00
Lancelot SIX 9ae972cf1e trap_handler: Fix handling of debugtrap for gfx11
For gfx11, the trap_handler fails to recognize a trap id 3 and report
the exception to the debugger if the debugger is attached.

This is because the 2nd level trap handler looks for the DEBUG_ENABLED
bit in ttmp13 instead of ttmp11.  This bit is set by the 1st level trap
handler and is part of the 1st/2nd level trap handler ABI.

Change-Id: Ib36361f53d9bcbbed52320d8c3a9ab2c0b28c7cd


[ROCm/ROCR-Runtime commit: 6916ce358a]
2023-12-04 15:03:31 +00:00
Lang Yu 43ae931ad5 Revert "Revert "Add support for GC 11.5.0 and 11.5.1""
This reverts commit a8e34eaec8.

gfx1150/1151 is merged into mainline now.

Change-Id: Id179949318a37888c74abb5a8610d95bc2f22906


[ROCm/ROCR-Runtime commit: 991bbdcf24]
2023-12-04 15:03:31 +00:00
David Yat Sin 663b461512 rocrtst: Speed-up Memory_Max_Mem test
Skip Extended-scope memory pool as allocation is very close to
fine-grain/coarse-grain but with just different PTE flags.

Only test coarse grain on CPU agent other than the first CPU agent.

Stop bisecting the max size once we are withing 5% to total size for
these pool to speed this test on large memory pools.

Change-Id: I77d1b45a1752ef092dda7c7f27723ea0a292a612


[ROCm/ROCR-Runtime commit: cb5a29955b]
2023-12-04 15:03:31 +00:00
David Yat Sin 0607f3c34d Increase scratch aperture size to 4GB per XCC
Change-Id: Ia02cea45ce8b782527f44fec539b0ab7cc453200


[ROCm/ROCR-Runtime commit: 642165b1bc]
2023-12-04 15:03:31 +00:00
Jonathan Kim 1931c4f8a4 Increase SDMA copy size
SDMA4.4 and SDMA5.2+ has increased it's available copy size to 2^30 bytes
represented by exponent as bits set in the COUNT field of the
linear copy.

Also note that the full 2^22 byte limit is available from SDMA4 onwards
as it has corrected the 0x3fffe0 HW limitation from SDMA3.

As copy limit has increase, this can change system performance
so provide env var HSA_ENABLE_SDMA_COPY_SIZE_OVERRIDE=0 to fall
back to the original 0x3fffe0 limit for debugging purposes.

Change-Id: I0fb6e5378f68e5b8a00ff559271691a943ee06ee


[ROCm/ROCR-Runtime commit: 81c64228e0]
2023-12-04 15:03:31 +00:00
Youssef Aly 1c1298c1c0 Enabled profiling for CPU agents for memcpy activities
To be able to trace memcpy asynchronously, both dst and src agents need to have profiling enabled and the api for enabling profiling was only enabling for gpu agents. CPU agents didn't have profiling enabled so the signal owner could not be known. hsa_amd_profiling_get_async_copy_time will fail with an HSA status error because it can't read the agent for the given signal.

Change-Id: Ie165e0e39b8fcd6992a55695b9ffcead10a8e812


[ROCm/ROCR-Runtime commit: ae1da390bd]
2023-12-04 15:01:59 +00:00
Jonathan R. Madsen 880ddd4387 rocprofiler-register support
- Update CMakeLists.txt
  - find_package for rocprofiler-register
    - this is an optional package until rocprofiler-register is added to the CI
  - define HSA_VERSION_{MAJOR,MINOR,PATCH} ppdefs
- Update runtime.cpp
  - include <rocprofiler-register/rocprofiler-register.h>
  - if rocprofiler-register succeeds, do not support v1 unless explicitly requested

Change-Id: I8f48bbf3f6b52fb91ddade2f198491a1256035fe


[ROCm/ROCR-Runtime commit: f9cf1852e5]
2023-12-04 15:01:59 +00:00
Jonathan Kim 73ab40ecd3 Restore default code object version usage for ROCr and ROCr Test
Remove override that forces ROCr image blit source and ROCr test to use
code object version 4 now that mainline has been updated to version 5.

Change-Id: I94681e86835c0e382475306ead4cd4132a2ee78f


[ROCm/ROCR-Runtime commit: 2f847cf05f]
2023-12-04 15:01:44 +00:00
David Yat Sin b177c0e9ca Handle HW_EXCEPTION events
Add handler to handle HW exception events reported by underlying
drivers. These events are generally caused by GPU resets and need the
application to abort.
As an improvement, in the future, we can provide additional information
about the exception (e.g mode-reset level)

Change-Id: If3fb5f19f9fce181a9d3b5e34a5506725856e7b0


[ROCm/ROCR-Runtime commit: 750212e50e]
2023-11-20 14:49:26 +00:00
David Yat Sin 8151aad0c2 libhsakmt: Handle HW_EXCEPTION events
Add new structures for HW Exception events and copy data from KFD to
expose to upper layers.

Change-Id: Icd5eb98997c47620e3b86277ab6d3abb7ed7d56f


[ROCm/ROCR-Runtime commit: 01ff2f7934]
2023-11-17 04:43:51 +00:00
Shweta Khatri 325f98d229 Updated the test to access PCIe domain info for the agent
Change-Id: I901fd76f91315a0262945659d12349ba7b64ed11


[ROCm/ROCR-Runtime commit: 4890ffe224]
2023-10-26 11:37:12 -04:00
David Yat Sin eb664927dd Add LoongArch64 Support
Patch submitted by user Xinmudotmoe on github

Change-Id: I58fd035b4ec4856f20d63747ababd49fa9764348


[ROCm/ROCR-Runtime commit: 1a7de9588e]
2023-10-26 11:36:16 -04:00
Yifan Zhang cc58a01e06 kfdtest: Change SetGetAttributesTest range granularity
granularity check is added in kfd w/ below patch:

commit 270c7a8375a91fec2fb4e2c253e3955d9b7540b4
Author: Jesse Zhang <jesse.zhang@amd.com>
Date:   Fri Oct 20 09:43:51 2023 +0800

    drm/amdkfd: Fix shift out-of-bounds issue

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index a690dced6860..f2b33fb2afcf 100644

Change-Id: I8cb037e3bf5db0a85661494b77e59984eca4d98d

--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -781,7 +781,7 @@ svm_range_apply_attrs(struct kfd_process *p, struct svm_range *prange,
                        prange->flags &= ~attrs[i].value;
                        break;
                case KFD_IOCTL_SVM_ATTR_GRANULARITY:
-                       prange->granularity = attrs[i].value;
+                       prange->granularity = min_t(uint32_t, attrs[i].value, 0x3F);
                        break;
                default:
                        WARN_ONCE(1, "svm_range_check_attrs wasn't called?");

Test cases have to been modified accordingly otherwise KFDSVMRangeTest.SetGetAttributesTest
fails.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Change-Id: Ifff47556bc398da6b18ad26ac545d139b63b0c92


[ROCm/ROCR-Runtime commit: 46fe316348]
2023-10-23 23:21:40 +08:00
Tony Tye d2542da27d Make AqlPacket::string more robust
AqlPacket::string should check the packet type is in range of the array
used to print its name.

Change-Id: I33dabbd941d086929526d842c9dbc0bd7305acd5


[ROCm/ROCR-Runtime commit: 7955fb01ec]
2023-10-18 12:54:36 -04:00