Commit Graph

2959 Commits

Author SHA1 Message Date
Alex Sierra 91f2a70817 core dump: ulimit check mechanism added
Core dump generation considers ulimit to generate the proper size
file.

Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: I61d991fc003b173f9075b66bff6a931447720695
2023-12-05 23:19:14 -05:00
Alex Sierra 514b222368 core dump: Front end core dump API
This API consists in one function to be called from a fault event at the
hsa-runtime to generate a core dump.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: Ib1b90d5beb13f93c4e8ebd21fd61705ebb12ca5d
2023-12-05 23:19:14 -05:00
Alex Sierra 1083d5c35f core dump: SegmentBuilder classes added
SegmentBuilder classes are used to get core dump data from the GPUs.
So far, it uses thunk API calls and smaps to collect all data from
the Hardware.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I2ad70ca5a951885181d3142653b186b0f6be739e
2023-12-05 23:19:14 -05:00
Giovanni LB 71bc875ccd Adding coordinate query to aqlprofile
Change-Id: I9f2fee62a24cf2a4784ba9e8c813b7b7296d034b
2023-12-05 13:25:30 -05:00
Giovanni LB e8920cacc8 : Adding ATT API extension to aqlprofile
Change-Id: Ic511cf871d5d98638d7041ca277f945ae8ced3a5
2023-12-05 13:25:10 -05:00
Jonathan R. Madsen 27eb0516bb rocprofiler-register updates
- fix logic for using HSA_TOOLS_LIB when rocprofiler-register support is enabled
- report tool load failure for rocprofiler-register

Change-Id: Ife23aa3e6ed19174376cd694764583b73f8976cd
2023-12-04 11:44:58 -06:00
David Yat Sin 251601b20b Add RISC-V support
Patch provided by user Xeonacid via github:
https://github.com/RadeonOpenCompute/ROCR-Runtime/pull/172/

Change-Id: I5f9086b536383093e7995b9cfdc19dab213f0265
2023-12-04 15:05:22 +00:00
David Yat Sin f07b8f2250 Use CPU_SET_S instead of CPU_SET
Fix incorrect use of CPU_SET on variable size cpu_set_t

Suggested by Christopher E. Moore on github
https://github.com/RadeonOpenCompute/ROCR-Runtime/issues/130

Change-Id: I710b56683ba07c08dcd83c851bf72e4f127a0ad4
2023-12-04 15:05:22 +00:00
Giovanni LB e0c6c5e5bf Extending AQLprofile API to include counter dimensions
Change-Id: If59489a085959f3f765a30e3e445df5151e30350
2023-12-04 15:05:22 +00:00
David Yat Sin a7a3358067 Implement alternate scratch
The alternate scratch memory is used for dispatches that have a low
number of waves but relatively large wave size.
This allows us to keep the tmpring_size.bits.WAVES field of the main
scratch to full occupancy.

Change-Id: I32d240fac4b7d38200d1eebc1b0fdc8a823920d3
2023-12-04 15:05:22 +00:00
David Yat Sin dca8f3a21d Implement async scratch reclaim
For devices where the CP FW supports asynchronous scratch reclaim, ROCr
is able to claw-back scratch memory that was assigned to an AQL queue.
With that ability, ROCr does not have to rely on using USO
(use-scratch-once) when assigning large amounts of memory to a queue.
If we reach a situation where we are running low on device memory, ROCr
will attempt to claw-back the scratch memory.

Change-Id: Iddf8ec84e37ab8b9fdc58bafbe2b61fe2acb6eb7
2023-12-04 15:05:22 +00:00
David Yat Sin 64070a9acc Refactor scratch handler function
Separate the event handler and scratch handler portions of the code into
separate functions.

Change-Id: Ifdb7461e816b0f2d3c1c0a74d6f020b4d6fc736c
2023-12-04 15:05:22 +00:00
David Yat Sin fa317f8c41 Re-arrange and rename scratch elements that are used with main scratch
Change-Id: I4c1ff8cf4121a06b586fe49c70400226506bf95e
2023-12-04 15:05:22 +00:00
David Yat Sin 0344c8c0b6 Update queue structure to support async reclaim
Update queue structure to add members required for asynchronous reclaim
mechanism and dual-scratch. CP will set the AMD_QUEUE_CAPS_ASYNC_RECLAIM
bit on queue-connect to indicate whether the new features are supported.

The new members are ignored by previous versions of CP FW

Change-Id: Ic8e9ef41c5b1d04f09b43bc9b44b31527863d10f
2023-12-04 15:05:22 +00:00
Shweta Khatri acf9e95027 Revert "Restore default code object version usage for ROCr and ROCr Test"
This reverts commit 6ef7fcedd1290b59190f81df1d25142ecb05d282.

Change-Id: Icc0300c25a89fcb99287d013863a00ace7e12129
2023-12-04 15:03:31 +00:00
Lancelot SIX 6916ce358a trap_handler: Fix handling of debugtrap for gfx11
For gfx11, the trap_handler fails to recognize a trap id 3 and report
the exception to the debugger if the debugger is attached.

This is because the 2nd level trap handler looks for the DEBUG_ENABLED
bit in ttmp13 instead of ttmp11.  This bit is set by the 1st level trap
handler and is part of the 1st/2nd level trap handler ABI.

Change-Id: Ib36361f53d9bcbbed52320d8c3a9ab2c0b28c7cd
2023-12-04 15:03:31 +00:00
Lang Yu 991bbdcf24 Revert "Revert "Add support for GC 11.5.0 and 11.5.1""
This reverts commit ebc51dd0eb.

gfx1150/1151 is merged into mainline now.

Change-Id: Id179949318a37888c74abb5a8610d95bc2f22906
2023-12-04 15:03:31 +00:00
David Yat Sin cb5a29955b rocrtst: Speed-up Memory_Max_Mem test
Skip Extended-scope memory pool as allocation is very close to
fine-grain/coarse-grain but with just different PTE flags.

Only test coarse grain on CPU agent other than the first CPU agent.

Stop bisecting the max size once we are withing 5% to total size for
these pool to speed this test on large memory pools.

Change-Id: I77d1b45a1752ef092dda7c7f27723ea0a292a612
2023-12-04 15:03:31 +00:00
David Yat Sin 642165b1bc Increase scratch aperture size to 4GB per XCC
Change-Id: Ia02cea45ce8b782527f44fec539b0ab7cc453200
2023-12-04 15:03:31 +00:00
Jonathan Kim 81c64228e0 Increase SDMA copy size
SDMA4.4 and SDMA5.2+ has increased it's available copy size to 2^30 bytes
represented by exponent as bits set in the COUNT field of the
linear copy.

Also note that the full 2^22 byte limit is available from SDMA4 onwards
as it has corrected the 0x3fffe0 HW limitation from SDMA3.

As copy limit has increase, this can change system performance
so provide env var HSA_ENABLE_SDMA_COPY_SIZE_OVERRIDE=0 to fall
back to the original 0x3fffe0 limit for debugging purposes.

Change-Id: I0fb6e5378f68e5b8a00ff559271691a943ee06ee
2023-12-04 15:03:31 +00:00
Youssef Aly ae1da390bd Enabled profiling for CPU agents for memcpy activities
To be able to trace memcpy asynchronously, both dst and src agents need to have profiling enabled and the api for enabling profiling was only enabling for gpu agents. CPU agents didn't have profiling enabled so the signal owner could not be known. hsa_amd_profiling_get_async_copy_time will fail with an HSA status error because it can't read the agent for the given signal.

Change-Id: Ie165e0e39b8fcd6992a55695b9ffcead10a8e812
2023-12-04 15:01:59 +00:00
Jonathan R. Madsen f9cf1852e5 rocprofiler-register support
- Update CMakeLists.txt
  - find_package for rocprofiler-register
    - this is an optional package until rocprofiler-register is added to the CI
  - define HSA_VERSION_{MAJOR,MINOR,PATCH} ppdefs
- Update runtime.cpp
  - include <rocprofiler-register/rocprofiler-register.h>
  - if rocprofiler-register succeeds, do not support v1 unless explicitly requested

Change-Id: I8f48bbf3f6b52fb91ddade2f198491a1256035fe
2023-12-04 15:01:59 +00:00
Jonathan Kim 2f847cf05f Restore default code object version usage for ROCr and ROCr Test
Remove override that forces ROCr image blit source and ROCr test to use
code object version 4 now that mainline has been updated to version 5.

Change-Id: I94681e86835c0e382475306ead4cd4132a2ee78f
2023-12-04 15:01:44 +00:00
David Yat Sin 750212e50e Handle HW_EXCEPTION events
Add handler to handle HW exception events reported by underlying
drivers. These events are generally caused by GPU resets and need the
application to abort.
As an improvement, in the future, we can provide additional information
about the exception (e.g mode-reset level)

Change-Id: If3fb5f19f9fce181a9d3b5e34a5506725856e7b0
2023-11-20 14:49:26 +00:00
David Yat Sin 01ff2f7934 libhsakmt: Handle HW_EXCEPTION events
Add new structures for HW Exception events and copy data from KFD to
expose to upper layers.

Change-Id: Icd5eb98997c47620e3b86277ab6d3abb7ed7d56f
2023-11-17 04:43:51 +00:00
Shweta Khatri 4890ffe224 Updated the test to access PCIe domain info for the agent
Change-Id: I901fd76f91315a0262945659d12349ba7b64ed11
2023-10-26 11:37:12 -04:00
David Yat Sin 1a7de9588e Add LoongArch64 Support
Patch submitted by user Xinmudotmoe on github

Change-Id: I58fd035b4ec4856f20d63747ababd49fa9764348
2023-10-26 11:36:16 -04:00
Yifan Zhang 46fe316348 kfdtest: Change SetGetAttributesTest range granularity
granularity check is added in kfd w/ below patch:

commit 270c7a8375a91fec2fb4e2c253e3955d9b7540b4
Author: Jesse Zhang <jesse.zhang@amd.com>
Date:   Fri Oct 20 09:43:51 2023 +0800

    drm/amdkfd: Fix shift out-of-bounds issue

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index a690dced6860..f2b33fb2afcf 100644

Change-Id: I8cb037e3bf5db0a85661494b77e59984eca4d98d

--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -781,7 +781,7 @@ svm_range_apply_attrs(struct kfd_process *p, struct svm_range *prange,
                        prange->flags &= ~attrs[i].value;
                        break;
                case KFD_IOCTL_SVM_ATTR_GRANULARITY:
-                       prange->granularity = attrs[i].value;
+                       prange->granularity = min_t(uint32_t, attrs[i].value, 0x3F);
                        break;
                default:
                        WARN_ONCE(1, "svm_range_check_attrs wasn't called?");

Test cases have to been modified accordingly otherwise KFDSVMRangeTest.SetGetAttributesTest
fails.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Change-Id: Ifff47556bc398da6b18ad26ac545d139b63b0c92
2023-10-23 23:21:40 +08:00
Tony Tye 7955fb01ec Make AqlPacket::string more robust
AqlPacket::string should check the packet type is in range of the array
used to print its name.

Change-Id: I33dabbd941d086929526d842c9dbc0bd7305acd5
2023-10-18 12:54:36 -04:00
Tony Tye 395ad3b77b AQL packet header may need to be loaded atomically
An AQL packet header field is stored using an atomic release, and needs
to be read using atomic acquire if it may be written by another thread.

Change-Id: I1d75587fd93f9c6216deebffc9a627b404a7e749
2023-10-18 12:54:36 -04:00
Tony Tye 23b4ce501d Add AMD_AQL_FORMAT_INTERCEPT_MARKER vendor packet
Define AMD_AQL_FORMAT_INTERCEPT_MARKER AMD vendor AQL packet. Add
support to intercept queue to invoke a callback for these packets.

Change-Id: Ia58d5fe2171f563632b4edd6343e02585f49d149
2023-10-18 12:54:36 -04:00
Tony Tye b020f66d39 Prevent accessing packets outside intercept queue
When the intecept queue copies packets from the proxy queue to the
wrapped queue, it should not attempt to copy packets that are outside
the proxy queue. This could happen if the user of the proxy queue
advances the write  pointer beyond the number of free slots and the
packet rewriter reduces the number of packets.

Change-Id: Id02f5df8aee0ed7269f4de813731d507cf2126b3
2023-10-18 12:54:36 -04:00
Tony Tye b64a845105 Support intercept queue with multiple packet rewriters
If an intercept queue is created and multiple packet rewriters are
registered, and if one of the rewriters invokes the packet writer
multiple times, then on returning from the packet writer the packet
rewriter index needs to be restored. Otherwise the next packet writer
call will start with an index of 0 which will be decremented and result
in out of bounds vector access.

Change-Id: Icb3f6a81ea04f1f7b91551b974a1f48c4f32db60
2023-10-18 12:54:36 -04:00
Tony Tye 9f4d651d14 Intercept queue handling for large rewrites
It is possible that packet rewriting an initial packet for the intercept
queue produces more packets that the size of the wrapped queue. The code
would never submit the such a set of packets as it attempted to submit
all or none. This can result in an infinite loop.

This is corrected to submit what will fit if the rewrite is larger than
the wrapped queue.

Change-Id: I8f03228c2e15151287e25de46eaee998f829c62a
2023-10-18 12:54:36 -04:00
Tony Tye d16c392338 Make intercept queue submission obstruction free
The intercept queue submit needs to be obstruction free as it can be
invoked by the runtime async handler helper thread. The code had a busy
wait loop waiting for a free slot to be available to add the retry
barrier packet. Blocking that thread prevents it servicing other async
handlers which may need to execute in order to allow packets on the
hardware queue to be processed to free up a slot.

Change the code to always leave one free slot unless there is a retry
barrier packet already on the queue.

Change-Id: If901c865550258b790b995d58037b0f99f1968cc
2023-10-18 12:54:36 -04:00
Tony Tye ca99795c58 Clarify intercept queue retry packet detection
Describe the assumption being made when checking if there is a retry
barrier packet on the queue. Also enforce the consequential requirement
of the minimum queue size.

Change-Id: I0efaffc5a79b9e2fdab3655b8b74270118a5c2ff
2023-10-18 12:54:36 -04:00
Tony Tye be6b8bb055 Correct intercept queue handling of the overflow queue
The intercept queue was processing all the packets on the proxy queue.
This could result in the rewrite of more than one packet being put on
the overflow queue. If there are a lot of packets on the intercept
queue this could result in the overflow queue having more packets than
the size of the hardware queue. The code to submit the overflow queue
fails if it is unable to put all the packets of the overflow on the
hardware queue. This resulted in an infinite loop. It also resulted in
an assert being reported that packets are being added to the overflow
queue when it is not empty.

Correct this by checking if the overflow queue is non-empty after
rewriting each packet. If it is non-empty then stop processing
additional packets. The additional packets will be processed when the
barrier packet added to the hardware queue is executed due to its asyn
handler. This barrier packet is added to the hardware queue whenever
packets are saved on the overflow queue.

Change-Id: I2537911d3c3ba1aac61a0a35f1ab97426a66b5a2
2023-10-18 12:54:36 -04:00
Jonathan Kim a36856b02a Use user requested engine ID when forcing SDMA copies
When forcing SDMA copies, engine ID specified by the requester should
still be used since the requester has hint of engine availability.

Change-Id: Idefa9494e407e31da510aa4c7c1fa283c85a4f6e
2023-10-18 10:45:02 -04:00
David Yat Sin 22be526230 Fix escape-to-IB packet definition
The Vendor specific header is only 8-bits and this would break the
behavior on big-endian machines. Renaming field to amd_format to match
name in spec sheets.

Change-Id: I65559757657565d3d3ff489d2663a0be42cf8ba5
2023-10-13 13:37:49 +00:00
Rajneesh Bhardwaj 5047eb161f libhsakmt: Use MADV_HUGEPAGE for large allocations
For large memory allocations (>2MB) the thunk should use the
MADV_HUGEPAGE flag for madvise call to optimize allocation performance
on certain operating systems that rely on madvise hint when Traspatent
Huge Pages is not set to always.

Suggested-by: Joseph Greathouse <joseph.greathouse@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Change-Id: Ic0c753f89a177b0f715942d6e2a7108b08a85f20
2023-10-12 17:01:34 -04:00
Tomasz Kłoczko a226542fc3 install .pc files in libdir
Provided pkgconfig file contains interface description which is arch
dependent. In such cases .pc files should be installed in libdir.

Signed-off-by: Tomasz Kłoczko <kloczek@github.com>
Change-Id: Ibbc85ad4aee1ef014c409dfa63313873b590464b
2023-10-11 15:50:38 -04:00
Torsten Keßler b44cca813d Mark new symbols in ROCm 5.7.0 as global
Change-Id: Ia0391cac7f432f019dea94f98a145dbf8120817d
2023-10-11 15:50:27 -04:00
Philip Yang 85a47fa66b libhsakmt: Set CWSR range granularity
Set CWSR svm range granularity to 0xff, then KFD will migrate the entire
CWSR range from VRAM back to system memory when recovering the CPU page
fault if rocgdb access CWSR area, this avoid the partial CWSR range
migration and stall CWSR GPU mapping issue.

This is a temporary workaround, it should be reverted once the KFD is
fixed.

Change-Id: I80a7248244574edba25b13858b7ebcf1c77b8930
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2023-10-06 10:46:40 -04:00
David Yat Sin 73efd3a14e libhsakmt: Fix incorrect flags for ext coherence
Change-Id: I89c838b9fbdb85589691f29806ae15884b25592f
2023-10-04 15:00:58 +00:00
David Yat Sin d021055ada rocrtst:Add tag for extended-scope fine grain
Change-Id: I2a64cf3fb476271b0a5d025fb6989feb40d676bb
2023-10-03 15:36:20 -04:00
David Yat Sin 96b3c4a0aa Allow CPU cache info to be empty
Some new CPUs have different cache reporting structure causing thunk to
leave the cache information empty. Allow the cache information for CPU
agents to be empty as they are not used by language-runtimes

Change-Id: Ic5e880171ab20aa114b4b62bdb4479eb54066f7b
2023-10-03 13:44:10 +00:00
James Zhu 693e686c4d kfdtest: remove IOMMUv2 performance monitor support
IOMMUv2 is removed from AMDGPU/KFD.

Change-Id: Ia00f9aa879a5f32a42bec914936d105d6845bc60
Signed-off-by: James Zhu <James.Zhu@amd.com>
2023-09-30 09:16:49 -04:00
James Zhu d195deeec4 libhsakmt: remove share resource in performance counter
This share resource is for IOMMUv2 which is removed from
AMDGPU/KFD.

Change-Id: Ia6e9311f1adc56fac2c9e8fa05b24c5ec8c272a5
Signed-off-by: James Zhu <James.Zhu@amd.com>
2023-09-30 08:54:10 -04:00
James Zhu 277d5e27ff libhsakmt: remove iommu_block which supports IOMMUv2 performance
IOMMUv2 is removed from AMDGPU/KFD.

Change-Id: I9fcf20ae9288cb40bb4b696284fc70534fb6484b
Signed-off-by: James Zhu <James.Zhu@amd.com>
2023-09-30 08:54:10 -04:00
James Zhu 274b5b51ca libhsakmt: remove IOMMUv2 performance monitor support
IOMMUv2 is removed from AMDGPU/KFD.

Change-Id: Ib87f501c07d9de90e6b83b98f98daacd5913e98a
Signed-off-by: James Zhu <James.Zhu@amd.com>
2023-09-30 08:54:10 -04:00