コミットグラフ

811 コミット

作成者 SHA1 メッセージ 日付
Sean Keely cedc3e80a8 Do not bump up total scratch size for large cached allocations.
HW does not ignore low bits of the scratch wave count and will
stride beyond the end of the allocation if the wave count is
ever indivisible by SE count.  Rather than returning the allocation
size for cached large scratch allocations, use the requested
scratch size in scratch setup.  Scratch cache will retain the
cached allocation's size.

Change-Id: I0129ddc99a8940d01d8fbcd0b02d5061f31f456d
2022-03-02 20:48:19 -05:00
Saravanan Solaiyappan a496adafaa Consider apt/yum upgrade operation check in package scripts.
Include the upgrade operation check in the prerm and postun scripts
in package.

Signed-off-by: Saravanan Solaiyappan <saravanan.solaiyappan@amd.com>
Change-Id: Ic766d8d68b5168e5f1b065d846ca2604d281e5be
2022-02-24 10:26:04 -05:00
Sean Keely b9a0c1d313 Do not discard fragment allocator blocks multiple times.
discardBlock may be called multiple times on the same block.
We must not discard the block multiple times or we will corrupt
in-use memory accounting.

Change-Id: Ife9f3162785965a795dcf81887d4d447cc096e62
2022-02-10 18:39:46 -06:00
Sean Keely 266cd68524 Add fallback case for cache line size.
KFD sometimes returns 0 for cache line sizes.

Change-Id: If82de0068318bbc138f0d1d4692ff908359174ad
2022-02-10 18:39:46 -06:00
Sean Keely 21291b48c6 Retrieve cache line size from KFD topology.
Change-Id: I16ddd9d9888bb973eccf3c562619894c88c7df15
2022-01-16 08:44:44 -06:00
Sean Keely a6742209f7 Correct queue minimum size enforcement.
Minimum queue size was not enforced at the Agent level.  Minimum
size should be one page to give unifority across all asics.

Change-Id: I26394f79458d09fbceb79fc8aaf495e2c26a8ff3
2022-01-16 08:28:34 -06:00
Sean Keely a65f3f5b71 Improve scratch error detection in debug mode.
Adds asserts for invalid dispatch dims and scratch requests that
don't actually use scratch.

Change-Id: I6e6eef3f17dc38adaf96550fa55bd8625868efa3
2022-01-31 20:53:24 -05:00
Sean Keely 37942c982a Add HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT.
On gfx90a only a reduced number of CUs must be used for cooperative
dispatches due to CWSR and launcher interactions with asymetric
harvest.  We must use one fewer CUs per SE than the lowest count of
CUs on any SE.

Also adds env var HSA_COOP_CU_COUNT which enables the cooperative
CU count computation.  Set to 1 to enable the new computation.
This is an opt-in feature that will become enabled by default (opt-out)
in a future release.

Change-Id: Ifbb75ced3bbc15876eef44922c6a4f6fde8c4c28
2022-01-31 15:22:07 -05:00
Chen Gong dec63b4f15 Correct the gfx version of gfx90c to 90c
Corrections have been made in libhsakmt, and corresponding changes are required here as well.

Signed-off-by: Chen Gong <curry.gong@amd.com>
Change-Id: Ib697ce25278c2c5ac6ef0206930ec285f46c60d1
2022-01-25 19:05:46 +08:00
Jeremy Newton bd1a4adf35 Install license file
See 

Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I80e9664b5ade520d9bf9b9a20ac36d67cfe85107
2022-01-17 10:54:54 -05:00
David Yat Sin 86164fbfec Fix for segfault after removing PrefetchRange from map
The start iterator becomes invalid after it is removed from
std::map prefetch_map_. This was causing a segfault when the iterator is
incremented afterwards.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Change-Id: I4b0b763d2cb4ee99c0b8571c2c526b834e74077a
2022-01-10 17:47:02 -05:00
Sean Keely fce6ba052e Correct documentation typo.
ROCM_VISIBLE_DEVICES was used where ROCR_VISIBLE_DEVICES was
intended.

Change-Id: I644a546f3c9dd0b50898ef8a21dbb8f5c3a36926
2021-12-10 16:19:30 -06:00
Sean Keely df55cb0450 Rework memory locks to allow device parallelism in alloc/free.
Prior solution used a single global lock to protect the memory tracking structures.
This change protects the memory tracking structure with a shared mutex (rw lock) in
shared (r) mode for memory allocations and frees so that long duration processes,
calling to kfd, can be done in parallel.  Operations which must modify the memory map
take the mutex in exclusive mode (w) and must not call to the thunk while holding
the mutex.

The fragment allocator now requires separate protection and is protected with a
mutex at the device level.  Protecting at the device level, rather than pool,
allows retention of the current recursive design and allows calling Trim from
withing Allocate.  This could be made finer (pool level locks) but would
require backing out of Allocate entirely to call Trim.  Trim and any retried
Allocation must be done in isolation (per device) or we may report OOM when
memory is actually available in some pool's fragment cache.  So some device
level serialization is required in at least some paths.

Change-Id: I7c1e94d6965ffcc602b12fefdd3a6e97b84b5e00
2021-11-24 19:22:05 -06:00
Sean Keely fc75731034 Add comments to GetPcieBlit.
Comments call out the specific operation being selected since the
ternary nest is a bit hard to read.

Change-Id: If033dbaa6cba132e96196ad3fc6d5572042041f4
2021-11-15 19:34:03 -06:00
Sean Keely b198016949 Fix leak in hsa_amd_interop_map_buffer.
Agent temp array could have leaked if one of the given agent
handles was invalid.

Change-Id: I9e638b3a4f6bb917a4e3209ad81a1253bb603365
2021-11-15 19:22:20 -06:00
Sean Keely f48a786662 Correct order of argument check and default assignment in lock APIs.
Argument must be checked for nullptr before being dereferenced and
filled with the default return value.

Change-Id: I9ff366f066a5e18c78129bf59cc3ba00fca3ef18
2021-11-15 19:22:02 -06:00
Sean Keely 322588a60e Add missing return in ScopeGuard::operator=.
This omission did not cause problems earlier due to having not been
instanced.

Change-Id: I7a54f82e06c299902f3bf6b4d3737cc5e30961ad
2021-11-15 18:50:46 -06:00
Sean Keely 19454fcf26 Correct node id assertion in pointer info.
Size of the node map was used as the max node id previously.  This
is wrong when RVD is used.

Change-Id: Ic632ec96891b92186e5b68cd53f81414db34f59f
2021-11-10 22:09:24 -06:00
Sean Keely c9eb85e205 Correct size of SVM node array.
Was size of the map.  Needs to be size of the node id range.

Change-Id: I92501ea7adca5c30dbb0fdabd2c421dea58f8d6f
2021-11-10 21:23:42 -06:00
Sean Keely d65e00bcc5 Include event_id in SDMA interrupt payload.
The event id assists KFD in locating the proper event associated
with the interrupt.

Change-Id: I75d58b6be74dd5b1edb0c5fe2b9d01538a649ba1
2021-11-10 20:57:11 -06:00
Jeremy Newton 48e4e2c5ff Set License field for RPM package
This really should be set to conform to distro standards.

Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I8c3bdcc7eb103cec9db6aa9f9cfec25754784be8
2021-11-10 14:06:17 -05:00
Aaron Liu f2a50c34f9 Fix compiling error with gcc-10.3.0
On gcc-10.3.0 environment, hsa-runtime building is failed as below log:
compute/hsa/runtime/rocrtst/suites/negative/queue_validation.cc:470:18: error: conversion from ‘unsigned int’ to ‘uint16_t’ {aka ‘short unsigned int’} changes value from ‘4294967295’ to ‘65535’ [-Werror=overflow]
  470 |     aql().header |=  0xFFFFFFFF << HSA_PACKET_HEADER_TYPE;
      |     ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
make[2]: *** [CMakeFiles/rocrtst64.dir/build.make:339: CMakeFiles/rocrtst64.dir/home/aaliu/work/compute/hsa/runtime/rocrtst/suites/negative/queue_validation.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Change-Id: I95fe72030368abc211b4b97b5a7ba00b5e094730
2021-11-04 10:55:11 +08:00
Sean Keely 402eae11b6 Correct rocrtst cmake.
Generates symlinks exactly once.
Admits parallelism to code object compilation.
Adds proper dependency tracking.
Adds code object files to the packages.

Change-Id: If471961906f16a2ffdc6bf5f682a4e322fb38f3e
2021-10-18 11:10:50 -04:00
Sean Keely 534dc3f60c Correct rocrtst pool iterator.
GetGlobalMemoryPool had improper return codes for an iterator callback
and did not properly order the APU pool selection path.

Change-Id: I01ab9d23e2352be98d9718bc25889ad4f779d3ca
2021-10-16 05:02:05 -04:00
Sean Keely 4b0c94cfe8 Silence Clang warning.
Clang warns about bitwise operators on bools.  Cast to int silences
the warning without introducing short circut logic.

Change-Id: I6e25138e1acf4a5562d3925ea5b2fcef3addb783
2021-10-14 23:56:58 -05:00
Sean Keely efeee734db Drop -Werror.
Would be nice to get warning count changes highlighted in CI though.

Clang's increasingly suspect diagnostics has caused multiple build
breaks without highlighting any actual issues.
Also: https://embeddedartistry.com/blog/2017/05/22/werror-is-not-your-friend/

Change-Id: I7dc82da58cd86f7b4f1a9fb511c4c039419271d4
2021-10-14 23:54:45 -05:00
Sean Keely 5e8d261352 Skip inital CU mask setup unless HSA_CU_MASK is defined for the GPU.
Limits CU masking application to cases where it is explicitly requested.

Change-Id: Ib65ad0ac98f86d840c0328fa15ce40c05cd4bfae
2021-10-12 20:31:56 -05:00
Freddy Paul ca899ea429 Cleanup symlink to header files and folders
Due to a CPACK bug the package needs to remove header file
symlinks.  Cleanup is required for uninstall and upgrade
since each release installs to a different folder.

Change-Id: I5ec378b21e69235404781c7bce3c0203eb38eed1
2021-10-12 14:56:02 -05:00
Sean Keely 19c1e92b4c Remove io_link workarounds.
KFD topology has been corrected and the defaults used by this
workaround are no longer true for all chips.

Change-Id: I0242d8077e9666ed1cf0dc3985244258ae5c0924
2021-10-11 19:15:07 -05:00
Xiaomeng Hou df59bfd57b Adjust the passing value for GPU agent when do max single allocation test
For APU asics, the default configuration size of video memory is
relatively small, plus the reserved region, ratio of max alloc size to
the pool size may below the expected value, so adjust it.

Change-Id: I798b44d9532aa6a381a1cc19faa5a46110bf0ad6
2021-10-11 02:32:09 -04:00
Xiaomeng Hou 9597fe3ae5 Add gfx1035 to rocrtst.
Change-Id: I276942b8badfd5ee2914e78c6c140d80d7cf4b2d
2021-10-11 02:31:45 -04:00
Sean Keely a2fb1cbfbc Correct GetSvmAttrib coherency query.
Early exit if the range is found to be fine grain.  Indeterminate
should only apply if the range is neither coarse nor fine.

Change-Id: I54133e14f4e8cfa53e2d612f6112cdcdb5a47dfa
2021-10-03 12:29:12 -04:00
Sean Keely c9440e7b11 Fix queue leak in Enqueue Latency test.
Change-Id: I50d17fb23d772ae8b966207f4af038ca538dcbb8
2021-10-03 12:27:49 -04:00
Sean Keely 234ef77e32 Close KFD when failing due to debugger state.
Change-Id: I6a6890fd9e86d27f87ae96de1c47c89d40a4e010
2021-10-03 12:27:49 -04:00
Sean Keely 280a458d0c Workaround gfx90a SDMA0 quirk.
Because of sharing ports with other engines, the
hardware design team has advised that SDMA0 on gfx90a
should only be used for host-to-device data transfers.
The recommendation is to use SDMA1 for any device-to-device
or device-to-host data transfers.

A driver change will ensure that, for each gfx90a
device, only the first PCIe SDMA queue a process
requests will possibly be from SDMA0. This patch ensures
that the first PCIe queue requested (which may be from
SDMA0) is always set up for host-to-device.

Change-Id: I6793ca95596dedaed9d5be1dbd9469ceef2a5c33
2021-09-30 05:53:49 -04:00
Sean Keely e0224ad89f Correct typo in package dependencies.
Change-Id: I3c378479ceb822e55168517e041a48fa8a2d3d98
2021-09-20 21:00:15 -05:00
Sean Keely 2e9a9f7c7a Correct Clang version detection and support for multiple prefix paths.
Bumps cmake minimum version to 3.7 for version comparison operator.

Previously the Clang cmake project version strings were used.  These
are not defined if the clang cmake project has not been loaded.
We should use CMAKE_CXX_COMPILER_VERSION to check the version when
only the compiler binary is redirected and the project files are
not available.

Also adjust device libs lookup logic to handle multiple paths in
CMAKE_PREFIX_PATH.

Change-Id: I67b6958d8241685cd6c3a0af68507c9fdc6331ef
2021-09-20 19:23:04 -05:00
Sean Keely a8c3ea82a4 Add debug option to skip setting the initial cu mask.
Adds debug variable HSA_CU_MASK_SKIP_INIT.

Change-Id: I5c742d1184a36fdef818bc50c3b780b859b68560
2021-09-16 23:43:49 -05:00
Sean Keely 5535b1f86f Correct fast f16 capability reporting.
Was hard coded to false.  Updated to reflect f16 availablity since
gfx8.

Change-Id: I7d5b9792c8e0163199c421a61b5d49b25cd98645
2021-09-16 21:15:52 -05:00
Oak Zeng 80206af91e Add gfx1013 support
Change-Id: I7122caea3ef2254b50bde25ec545116685452116
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2021-09-15 01:10:20 -04:00
Sean Keely 8d789461bf Add gfx1013 to rocrtst.
Change-Id: I49c4aafc661d7c2ba6fd6baa006bf698b0af5274
2021-09-14 22:55:43 -04:00
Sean Keely 5af558f739 Place GPU local resources in nearest NUMA node.
For minimal latency we should place command queues and blit code
in the nearest numa node to each GPU.  Add an allocator matching
the current runtime default allocator interface to each GpuAgent
that allocates on the closest numa node as represented by kfd
topology.  Use this allocator for queue ring buffers and blit
objects.

Change-Id: I181127f9c27bafe68976312963146616e3f58369
2021-09-14 17:49:24 -04:00
Sean Keely 907679c989 Register the default queue error handler for all internal queues.
Also make failure to handle queue errors fatal.

Motivation is to improve detection of queue error conditions
that currently appear as application hangs.

Change-Id: I655643616dc0bd303d7df3ce8aca2c099bec3d46
2021-08-27 20:11:58 -04:00
Sean Keely 9e7d4629ca Add check_required_components to end of cmake package file.
Sets package found and component lists.  ROCr does not have components
so this is mostly cosmetic.  It's part of maintaining a compliant
cmake project config file though.

Change-Id: Ida2ef746375143babd3a6f938727a47135606f01
2021-08-27 20:07:26 -04:00
Sean Keely e06dd39d89 Correct queue exception hander termination.
Set handler state to terminated before exiting.
Also simplify scratch handler exit loop.

Change-Id: I0a80c8a1899e8b60a6e7aa6989ba28de42ba31e7
2021-08-27 20:06:44 -04:00
Sean Keely 76e6ff0411 Minor spelling fix in comments.
Change-Id: Ia99ac3f75444675be48b3d965552fab79da37c92
2021-08-27 20:06:17 -04:00
Sean Keely eec545fd9f Build fix for older clang compilers.
Per clang 13 option -Wno-error=unused-but-set-variable is not
recoginized nor is the diagnostic emitted.  Set this option
conditional to the clang compiler version.

Change-Id: I3c0958dffa985d53b641f9eff4e702988dffd033
2021-08-27 20:06:10 -04:00
Sean Keely 02666ec0f1 Allow passing zero bits into hsa_amd_queue_cu_set_mask.
Passing 0 into num_cu_mask_count used to be an implicit error.
This has been repurposed as a short hand for enabling all CUs.
Enabling all CUs when HSA_CU_MASK is set will cause the CU mask to
reset to whatever was set by HSA_CU_MASK which may then be queried.

Change-Id: I1d6bb2034595a78ee48fa72aa05563e8ea6c0fff
2021-08-27 20:05:24 -04:00
Sean Keely 2aa0795b33 Improve HSA_CU_MASK parsing efficiency.
Delay parsing until after GPU discovery.  Use the surfaced
GPU count and maximum phyiscal CU count to limit parsed bit masks.

This prevents pathological input such as
HSA_CU_MASK=0-8000000:0-8000000 from attempting to consume 7TiB.

Change-Id: I3773d2db3740c2023b0f6275d1818b69119b0495
2021-08-27 20:05:18 -04:00
Sean Keely 7512c32f69 Add emulator build notification to rocrtst.
Change-Id: I3eb5fd5ec26541f3459aebf289d25c942f09da02
2021-08-13 16:46:26 -05:00