Grafico dei commit

1084 Commit

Autore SHA1 Messaggio Data
Giovanni LB ecd768797e Extending AQLprofile API to include counter dimensions
Change-Id: If59489a085959f3f765a30e3e445df5151e30350


[ROCm/ROCR-Runtime commit: e0c6c5e5bf]
2023-12-04 15:05:22 +00:00
David Yat Sin 6140d8a66d Implement alternate scratch
The alternate scratch memory is used for dispatches that have a low
number of waves but relatively large wave size.
This allows us to keep the tmpring_size.bits.WAVES field of the main
scratch to full occupancy.

Change-Id: I32d240fac4b7d38200d1eebc1b0fdc8a823920d3


[ROCm/ROCR-Runtime commit: a7a3358067]
2023-12-04 15:05:22 +00:00
David Yat Sin 66b9fdc2d6 Implement async scratch reclaim
For devices where the CP FW supports asynchronous scratch reclaim, ROCr
is able to claw-back scratch memory that was assigned to an AQL queue.
With that ability, ROCr does not have to rely on using USO
(use-scratch-once) when assigning large amounts of memory to a queue.
If we reach a situation where we are running low on device memory, ROCr
will attempt to claw-back the scratch memory.

Change-Id: Iddf8ec84e37ab8b9fdc58bafbe2b61fe2acb6eb7


[ROCm/ROCR-Runtime commit: dca8f3a21d]
2023-12-04 15:05:22 +00:00
David Yat Sin fa600434ee Refactor scratch handler function
Separate the event handler and scratch handler portions of the code into
separate functions.

Change-Id: Ifdb7461e816b0f2d3c1c0a74d6f020b4d6fc736c


[ROCm/ROCR-Runtime commit: 64070a9acc]
2023-12-04 15:05:22 +00:00
David Yat Sin b1942bff27 Re-arrange and rename scratch elements that are used with main scratch
Change-Id: I4c1ff8cf4121a06b586fe49c70400226506bf95e


[ROCm/ROCR-Runtime commit: fa317f8c41]
2023-12-04 15:05:22 +00:00
David Yat Sin 03e87e3d66 Update queue structure to support async reclaim
Update queue structure to add members required for asynchronous reclaim
mechanism and dual-scratch. CP will set the AMD_QUEUE_CAPS_ASYNC_RECLAIM
bit on queue-connect to indicate whether the new features are supported.

The new members are ignored by previous versions of CP FW

Change-Id: Ic8e9ef41c5b1d04f09b43bc9b44b31527863d10f


[ROCm/ROCR-Runtime commit: 0344c8c0b6]
2023-12-04 15:05:22 +00:00
Shweta Khatri 43f1ee386f Revert "Restore default code object version usage for ROCr and ROCr Test"
This reverts commit 6ef7fcedd1290b59190f81df1d25142ecb05d282.

Change-Id: Icc0300c25a89fcb99287d013863a00ace7e12129


[ROCm/ROCR-Runtime commit: acf9e95027]
2023-12-04 15:03:31 +00:00
Lancelot SIX 9ae972cf1e trap_handler: Fix handling of debugtrap for gfx11
For gfx11, the trap_handler fails to recognize a trap id 3 and report
the exception to the debugger if the debugger is attached.

This is because the 2nd level trap handler looks for the DEBUG_ENABLED
bit in ttmp13 instead of ttmp11.  This bit is set by the 1st level trap
handler and is part of the 1st/2nd level trap handler ABI.

Change-Id: Ib36361f53d9bcbbed52320d8c3a9ab2c0b28c7cd


[ROCm/ROCR-Runtime commit: 6916ce358a]
2023-12-04 15:03:31 +00:00
Lang Yu 43ae931ad5 Revert "Revert "Add support for GC 11.5.0 and 11.5.1""
This reverts commit a8e34eaec8.

gfx1150/1151 is merged into mainline now.

Change-Id: Id179949318a37888c74abb5a8610d95bc2f22906


[ROCm/ROCR-Runtime commit: 991bbdcf24]
2023-12-04 15:03:31 +00:00
David Yat Sin 663b461512 rocrtst: Speed-up Memory_Max_Mem test
Skip Extended-scope memory pool as allocation is very close to
fine-grain/coarse-grain but with just different PTE flags.

Only test coarse grain on CPU agent other than the first CPU agent.

Stop bisecting the max size once we are withing 5% to total size for
these pool to speed this test on large memory pools.

Change-Id: I77d1b45a1752ef092dda7c7f27723ea0a292a612


[ROCm/ROCR-Runtime commit: cb5a29955b]
2023-12-04 15:03:31 +00:00
David Yat Sin 0607f3c34d Increase scratch aperture size to 4GB per XCC
Change-Id: Ia02cea45ce8b782527f44fec539b0ab7cc453200


[ROCm/ROCR-Runtime commit: 642165b1bc]
2023-12-04 15:03:31 +00:00
Jonathan Kim 1931c4f8a4 Increase SDMA copy size
SDMA4.4 and SDMA5.2+ has increased it's available copy size to 2^30 bytes
represented by exponent as bits set in the COUNT field of the
linear copy.

Also note that the full 2^22 byte limit is available from SDMA4 onwards
as it has corrected the 0x3fffe0 HW limitation from SDMA3.

As copy limit has increase, this can change system performance
so provide env var HSA_ENABLE_SDMA_COPY_SIZE_OVERRIDE=0 to fall
back to the original 0x3fffe0 limit for debugging purposes.

Change-Id: I0fb6e5378f68e5b8a00ff559271691a943ee06ee


[ROCm/ROCR-Runtime commit: 81c64228e0]
2023-12-04 15:03:31 +00:00
Youssef Aly 1c1298c1c0 Enabled profiling for CPU agents for memcpy activities
To be able to trace memcpy asynchronously, both dst and src agents need to have profiling enabled and the api for enabling profiling was only enabling for gpu agents. CPU agents didn't have profiling enabled so the signal owner could not be known. hsa_amd_profiling_get_async_copy_time will fail with an HSA status error because it can't read the agent for the given signal.

Change-Id: Ie165e0e39b8fcd6992a55695b9ffcead10a8e812


[ROCm/ROCR-Runtime commit: ae1da390bd]
2023-12-04 15:01:59 +00:00
Jonathan R. Madsen 880ddd4387 rocprofiler-register support
- Update CMakeLists.txt
  - find_package for rocprofiler-register
    - this is an optional package until rocprofiler-register is added to the CI
  - define HSA_VERSION_{MAJOR,MINOR,PATCH} ppdefs
- Update runtime.cpp
  - include <rocprofiler-register/rocprofiler-register.h>
  - if rocprofiler-register succeeds, do not support v1 unless explicitly requested

Change-Id: I8f48bbf3f6b52fb91ddade2f198491a1256035fe


[ROCm/ROCR-Runtime commit: f9cf1852e5]
2023-12-04 15:01:59 +00:00
Jonathan Kim 73ab40ecd3 Restore default code object version usage for ROCr and ROCr Test
Remove override that forces ROCr image blit source and ROCr test to use
code object version 4 now that mainline has been updated to version 5.

Change-Id: I94681e86835c0e382475306ead4cd4132a2ee78f


[ROCm/ROCR-Runtime commit: 2f847cf05f]
2023-12-04 15:01:44 +00:00
David Yat Sin b177c0e9ca Handle HW_EXCEPTION events
Add handler to handle HW exception events reported by underlying
drivers. These events are generally caused by GPU resets and need the
application to abort.
As an improvement, in the future, we can provide additional information
about the exception (e.g mode-reset level)

Change-Id: If3fb5f19f9fce181a9d3b5e34a5506725856e7b0


[ROCm/ROCR-Runtime commit: 750212e50e]
2023-11-20 14:49:26 +00:00
Shweta Khatri 325f98d229 Updated the test to access PCIe domain info for the agent
Change-Id: I901fd76f91315a0262945659d12349ba7b64ed11


[ROCm/ROCR-Runtime commit: 4890ffe224]
2023-10-26 11:37:12 -04:00
David Yat Sin eb664927dd Add LoongArch64 Support
Patch submitted by user Xinmudotmoe on github

Change-Id: I58fd035b4ec4856f20d63747ababd49fa9764348


[ROCm/ROCR-Runtime commit: 1a7de9588e]
2023-10-26 11:36:16 -04:00
Tony Tye d2542da27d Make AqlPacket::string more robust
AqlPacket::string should check the packet type is in range of the array
used to print its name.

Change-Id: I33dabbd941d086929526d842c9dbc0bd7305acd5


[ROCm/ROCR-Runtime commit: 7955fb01ec]
2023-10-18 12:54:36 -04:00
Tony Tye 9cdf39a706 AQL packet header may need to be loaded atomically
An AQL packet header field is stored using an atomic release, and needs
to be read using atomic acquire if it may be written by another thread.

Change-Id: I1d75587fd93f9c6216deebffc9a627b404a7e749


[ROCm/ROCR-Runtime commit: 395ad3b77b]
2023-10-18 12:54:36 -04:00
Tony Tye fd757292fb Add AMD_AQL_FORMAT_INTERCEPT_MARKER vendor packet
Define AMD_AQL_FORMAT_INTERCEPT_MARKER AMD vendor AQL packet. Add
support to intercept queue to invoke a callback for these packets.

Change-Id: Ia58d5fe2171f563632b4edd6343e02585f49d149


[ROCm/ROCR-Runtime commit: 23b4ce501d]
2023-10-18 12:54:36 -04:00
Tony Tye 52d6235a1d Prevent accessing packets outside intercept queue
When the intecept queue copies packets from the proxy queue to the
wrapped queue, it should not attempt to copy packets that are outside
the proxy queue. This could happen if the user of the proxy queue
advances the write  pointer beyond the number of free slots and the
packet rewriter reduces the number of packets.

Change-Id: Id02f5df8aee0ed7269f4de813731d507cf2126b3


[ROCm/ROCR-Runtime commit: b020f66d39]
2023-10-18 12:54:36 -04:00
Tony Tye 63c3bafab7 Support intercept queue with multiple packet rewriters
If an intercept queue is created and multiple packet rewriters are
registered, and if one of the rewriters invokes the packet writer
multiple times, then on returning from the packet writer the packet
rewriter index needs to be restored. Otherwise the next packet writer
call will start with an index of 0 which will be decremented and result
in out of bounds vector access.

Change-Id: Icb3f6a81ea04f1f7b91551b974a1f48c4f32db60


[ROCm/ROCR-Runtime commit: b64a845105]
2023-10-18 12:54:36 -04:00
Tony Tye 0f1e43f6f3 Intercept queue handling for large rewrites
It is possible that packet rewriting an initial packet for the intercept
queue produces more packets that the size of the wrapped queue. The code
would never submit the such a set of packets as it attempted to submit
all or none. This can result in an infinite loop.

This is corrected to submit what will fit if the rewrite is larger than
the wrapped queue.

Change-Id: I8f03228c2e15151287e25de46eaee998f829c62a


[ROCm/ROCR-Runtime commit: 9f4d651d14]
2023-10-18 12:54:36 -04:00
Tony Tye 9cd957942d Make intercept queue submission obstruction free
The intercept queue submit needs to be obstruction free as it can be
invoked by the runtime async handler helper thread. The code had a busy
wait loop waiting for a free slot to be available to add the retry
barrier packet. Blocking that thread prevents it servicing other async
handlers which may need to execute in order to allow packets on the
hardware queue to be processed to free up a slot.

Change the code to always leave one free slot unless there is a retry
barrier packet already on the queue.

Change-Id: If901c865550258b790b995d58037b0f99f1968cc


[ROCm/ROCR-Runtime commit: d16c392338]
2023-10-18 12:54:36 -04:00
Tony Tye d1a017311d Clarify intercept queue retry packet detection
Describe the assumption being made when checking if there is a retry
barrier packet on the queue. Also enforce the consequential requirement
of the minimum queue size.

Change-Id: I0efaffc5a79b9e2fdab3655b8b74270118a5c2ff


[ROCm/ROCR-Runtime commit: ca99795c58]
2023-10-18 12:54:36 -04:00
Tony Tye 5bb0cc60f5 Correct intercept queue handling of the overflow queue
The intercept queue was processing all the packets on the proxy queue.
This could result in the rewrite of more than one packet being put on
the overflow queue. If there are a lot of packets on the intercept
queue this could result in the overflow queue having more packets than
the size of the hardware queue. The code to submit the overflow queue
fails if it is unable to put all the packets of the overflow on the
hardware queue. This resulted in an infinite loop. It also resulted in
an assert being reported that packets are being added to the overflow
queue when it is not empty.

Correct this by checking if the overflow queue is non-empty after
rewriting each packet. If it is non-empty then stop processing
additional packets. The additional packets will be processed when the
barrier packet added to the hardware queue is executed due to its asyn
handler. This barrier packet is added to the hardware queue whenever
packets are saved on the overflow queue.

Change-Id: I2537911d3c3ba1aac61a0a35f1ab97426a66b5a2


[ROCm/ROCR-Runtime commit: be6b8bb055]
2023-10-18 12:54:36 -04:00
Jonathan Kim 9a0c51ae92 Use user requested engine ID when forcing SDMA copies
When forcing SDMA copies, engine ID specified by the requester should
still be used since the requester has hint of engine availability.

Change-Id: Idefa9494e407e31da510aa4c7c1fa283c85a4f6e


[ROCm/ROCR-Runtime commit: a36856b02a]
2023-10-18 10:45:02 -04:00
David Yat Sin dc0cfa8a54 Fix escape-to-IB packet definition
The Vendor specific header is only 8-bits and this would break the
behavior on big-endian machines. Renaming field to amd_format to match
name in spec sheets.

Change-Id: I65559757657565d3d3ff489d2663a0be42cf8ba5


[ROCm/ROCR-Runtime commit: 22be526230]
2023-10-13 13:37:49 +00:00
David Yat Sin dbdb3af4fc rocrtst:Add tag for extended-scope fine grain
Change-Id: I2a64cf3fb476271b0a5d025fb6989feb40d676bb


[ROCm/ROCR-Runtime commit: d021055ada]
2023-10-03 15:36:20 -04:00
David Yat Sin 9049e53b91 Allow CPU cache info to be empty
Some new CPUs have different cache reporting structure causing thunk to
leave the cache information empty. Allow the cache information for CPU
agents to be empty as they are not used by language-runtimes

Change-Id: Ic5e880171ab20aa114b4b62bdb4479eb54066f7b


[ROCm/ROCR-Runtime commit: 96b3c4a0aa]
2023-10-03 13:44:10 +00:00
Shweta Khatri ecde4153d8 Using new KFD HSA extended coherent memory flag
Using new ExtendedCoherent KFD HSA memory flag to achieve system
scope coherence on atomic instructions. Non-compliant systems may
have the need to perform explicit HDP flushes to achieve system
scope coherence using this flag.

Change-Id: Ic6b47c0e97285086fa1f52bbfa4597b81cadafeb


[ROCm/ROCR-Runtime commit: 4eb6ed7799]
2023-09-25 10:36:04 -04:00
David Yat Sin 08fc87ecba Use scope guards to release ref counts
Some negative tests can trigger C++ exceptions to be thrown, which
causes code to leave the ref counts in inconsistent state.

Change-Id: Ifa6d8be986941efcdf20d7ac8b86eb15a8fe9932


[ROCm/ROCR-Runtime commit: 06eefdeb1b]
2023-09-20 15:08:52 -04:00
David Yat Sin b060204498 Fix hsa_amd_vmem_get_access to accept offset pointers
Modify hsa_amd_vmem_get_access to handle pointers that are within VA
range of an existing memory mapping

Change-Id: I9f806ec39f6e9a33da8d86dd65d9a472438fa8ed


[ROCm/ROCR-Runtime commit: dd61f54171]
2023-09-20 14:03:37 -04:00
David Yat Sin 48cb2f5a9e Add query for Xnack enabled
Add system query for whether Xnack is enabled on a system.

Change-Id: I2832110e4f33f6a951d13acd06636442debf27ae


[ROCm/ROCR-Runtime commit: 22becfb1e8]
2023-09-19 00:25:30 +00:00
Jonathan Kim d04acccc26 Set correct overrides settings for GangLeader functions
Silence warnings on more stringent compile checks for lack of override
declaration.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: Iaa54dfc3dd74f5ee55763cafbbcf2db73493bb21


[ROCm/ROCR-Runtime commit: 6b4365ae4c]
2023-09-12 15:56:34 -04:00
David Yat Sin 2052be1d1d Pre-allocate memory for 16K signals
On busy systems, the memory allocation can take long duration and
increase calls to hsa_signal_create/hsa_amd_signal_create. This
mitigates this issue.

Change-Id: Ib7640273262ebc3dbf1f07049ce5da10b1d6b158


[ROCm/ROCR-Runtime commit: 9a127193a8]
2023-09-11 13:08:28 -04:00
David Yat Sin 2a2555dd52 Update blit shaders for gfx94x
Change-Id: Ic8def71aa0c6ab9a9a758877a65ca6b5625e8f1e


[ROCm/ROCR-Runtime commit: 6ce1586def]
2023-09-08 09:43:31 -04:00
Shweta Khatri e2c5ecb8dc Use LLVM compiler to build blit shaders
Generates shader bytecode stream in amd_blit_shaders_v2.h at build time

Change-Id: I5228ec5442a78d074fd85ca9cd7f7a156dd84da3


[ROCm/ROCR-Runtime commit: 4e675ce730]
2023-09-08 09:42:29 -04:00
David Yat Sin 590cac0321 Fix clang compile warnings
Change-Id: Iea9afc3d998a6c5db28af6c7b54939960b11ae95


[ROCm/ROCR-Runtime commit: 3ee6c9b0e2]
2023-09-07 12:00:02 -04:00
David Yat Sin 3e286607ca Fix for always returning 64 for cacheline size
Change-Id: I0e31d306a2e051ecb9ac019c4e6f5efa25eabba0


[ROCm/ROCR-Runtime commit: 4770b210f6]
2023-08-31 13:50:49 +00:00
David Yat Sin 5b9dcfd0d8 Update interface version for virtual memory APIs
Change-Id: Ifbf1af08ee7aa4d55387ff9786f6a61b89b56f88


[ROCm/ROCR-Runtime commit: 1e7b078628]
2023-08-30 17:01:13 -04:00
David Yat Sin 4e46eded66 Increment HSA API table stepping on new APIs
Add compile time asserts to force incrementing API table STEP versions
each time a new function is added to each table. This is required for
profiler team to be able to add preprocessor macros to determine which
versions contain the new APIs.

Also incrementing the major versions to 2 to indicate new numbering
scheme.

Change-Id: I148a436a5ceab6be3906f8263b40ea9b07841577


[ROCm/ROCR-Runtime commit: 03f2f69d16]
2023-08-29 21:59:36 +00:00
Jonathan Kim 9e533f6664 Submit a minimum of 64 DWORDs for SDMA submissions for some GFX9 devices
Some GFX9 devices will drop commands if ring buffer submission is less
than 64 DWORDs.  Pad submission with a NOP head an trailing null
DWORDs in this case.

Change-Id: I850af490fb699f7efe8aef96d97c600a8e76516b


[ROCm/ROCR-Runtime commit: cdd0728d9b]
2023-08-23 13:36:29 -04:00
David Yat Sin 0637810752 Fix memory pool ALLOC_REC_GRANULE query
Also changed enum value to leave gap between enums that only exist in
hsa_region_info_t and enums that exist in both hsa_amd_memory_pool_info_t

Change-Id: I8f9f31200de66648e9328e4203ab283068c993f0


[ROCm/ROCR-Runtime commit: 4317f8dece]
2023-08-22 17:46:48 -04:00
David Yat Sin 777df5c6dc Fix flags passed to thunk for address reserve
Fix flags passed to thunk when reserving address only

Change-Id: Ic91d4c3393cc6a2b98e6bc5ed3575d40fa5e1424


[ROCm/ROCR-Runtime commit: 7be305b83c]
2023-08-22 14:01:49 -04:00
Jonathan Kim ad613e1644 Clean up SDMA ganging
We don't need to keep track of specific blit engines in gang for
submission anymore as ganging early exits on pending bytes.
So tidy up the fluff.

Change-Id: I77e80bf1ad8f561a03fff77bce33aa09d02760c6


[ROCm/ROCR-Runtime commit: 132815bcfb]
2023-08-22 05:57:04 -04:00
Jonathan Kim 704d9c5e19 Fix SDMA ganging circular deadlock in oversubscription
When oversubscribing SDMA gangs, a circular deadlock can occur since
gang enqueue is staggered with respect to SDMA engine leader based
on source to destination.
As a result, an enqueued leader may be waiting on a gang item that is
waiting on another enqueued leader or gang item and so on.

To prevent this, first lock the submission to ensure dma status query
and submissions are atomic.  Once this is in place, be more stringent
with ganging in that all SDMA engines must be available in order to gang.

Finally, re-enable SDMA ganging by default.

Change-Id: I4511e3487db9d26475b5aece4897f10168cc5322


[ROCm/ROCR-Runtime commit: 8f21793a3e]
2023-08-17 08:49:09 -04:00
Jonathan Kim 58d5f7354f Update D2D SDMA ganging for non-SPX modes
xGMI for compute partitioning in non-SPX modes does not have
a reported bandwith.
Fix it to at most 2 since each partition is either bounded
by the number of xGMI links or the number of available
SDMA contexts.

Change-Id: I09094bd7548d9eee6f039b0efe849838e5de166e


[ROCm/ROCR-Runtime commit: 4c74e47e91]
2023-08-17 07:25:08 -04:00
Jonathan Kim 2994cfa875 Bump the number of SDMA engines for gfx940
GFX940 can support up to 16 SDMA engines so bump it.

Change-Id: I41a95e66383036735712e317a57b239d84fcb78d


[ROCm/ROCR-Runtime commit: 30982ff6aa]
2023-08-17 07:25:08 -04:00