커밋 그래프

807 커밋

작성자 SHA1 메시지 날짜
Sean Keely 523e6e883a Do not discard fragment allocator blocks multiple times.
discardBlock may be called multiple times on the same block.
We must not discard the block multiple times or we will corrupt
in-use memory accounting.

Change-Id: Ife9f3162785965a795dcf81887d4d447cc096e62


[ROCm/ROCR-Runtime commit: b9a0c1d313]
2022-02-10 18:39:46 -06:00
Sean Keely 305b7394b3 Add fallback case for cache line size.
KFD sometimes returns 0 for cache line sizes.

Change-Id: If82de0068318bbc138f0d1d4692ff908359174ad


[ROCm/ROCR-Runtime commit: 266cd68524]
2022-02-10 18:39:46 -06:00
Sean Keely ab97440eba Retrieve cache line size from KFD topology.
Change-Id: I16ddd9d9888bb973eccf3c562619894c88c7df15


[ROCm/ROCR-Runtime commit: 21291b48c6]
2022-01-16 08:44:44 -06:00
Sean Keely 0e96cb895f Correct queue minimum size enforcement.
Minimum queue size was not enforced at the Agent level.  Minimum
size should be one page to give unifority across all asics.

Change-Id: I26394f79458d09fbceb79fc8aaf495e2c26a8ff3


[ROCm/ROCR-Runtime commit: a6742209f7]
2022-01-16 08:28:34 -06:00
Sean Keely 92f675889c Improve scratch error detection in debug mode.
Adds asserts for invalid dispatch dims and scratch requests that
don't actually use scratch.

Change-Id: I6e6eef3f17dc38adaf96550fa55bd8625868efa3


[ROCm/ROCR-Runtime commit: a65f3f5b71]
2022-01-31 20:53:24 -05:00
Sean Keely e2e10173d2 Add HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT.
On gfx90a only a reduced number of CUs must be used for cooperative
dispatches due to CWSR and launcher interactions with asymetric
harvest.  We must use one fewer CUs per SE than the lowest count of
CUs on any SE.

Also adds env var HSA_COOP_CU_COUNT which enables the cooperative
CU count computation.  Set to 1 to enable the new computation.
This is an opt-in feature that will become enabled by default (opt-out)
in a future release.

Change-Id: Ifbb75ced3bbc15876eef44922c6a4f6fde8c4c28


[ROCm/ROCR-Runtime commit: 37942c982a]
2022-01-31 15:22:07 -05:00
Chen Gong df788f1e49 Correct the gfx version of gfx90c to 90c
Corrections have been made in libhsakmt, and corresponding changes are required here as well.

Signed-off-by: Chen Gong <curry.gong@amd.com>
Change-Id: Ib697ce25278c2c5ac6ef0206930ec285f46c60d1


[ROCm/ROCR-Runtime commit: dec63b4f15]
2022-01-25 19:05:46 +08:00
Jeremy Newton f654b7d852 Install license file
See 

Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I80e9664b5ade520d9bf9b9a20ac36d67cfe85107


[ROCm/ROCR-Runtime commit: bd1a4adf35]
2022-01-17 10:54:54 -05:00
David Yat Sin 4fb019555b Fix for segfault after removing PrefetchRange from map
The start iterator becomes invalid after it is removed from
std::map prefetch_map_. This was causing a segfault when the iterator is
incremented afterwards.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Change-Id: I4b0b763d2cb4ee99c0b8571c2c526b834e74077a


[ROCm/ROCR-Runtime commit: 86164fbfec]
2022-01-10 17:47:02 -05:00
Sean Keely ef1f4724c3 Correct documentation typo.
ROCM_VISIBLE_DEVICES was used where ROCR_VISIBLE_DEVICES was
intended.

Change-Id: I644a546f3c9dd0b50898ef8a21dbb8f5c3a36926


[ROCm/ROCR-Runtime commit: fce6ba052e]
2021-12-10 16:19:30 -06:00
Sean Keely 3227859ff2 Rework memory locks to allow device parallelism in alloc/free.
Prior solution used a single global lock to protect the memory tracking structures.
This change protects the memory tracking structure with a shared mutex (rw lock) in
shared (r) mode for memory allocations and frees so that long duration processes,
calling to kfd, can be done in parallel.  Operations which must modify the memory map
take the mutex in exclusive mode (w) and must not call to the thunk while holding
the mutex.

The fragment allocator now requires separate protection and is protected with a
mutex at the device level.  Protecting at the device level, rather than pool,
allows retention of the current recursive design and allows calling Trim from
withing Allocate.  This could be made finer (pool level locks) but would
require backing out of Allocate entirely to call Trim.  Trim and any retried
Allocation must be done in isolation (per device) or we may report OOM when
memory is actually available in some pool's fragment cache.  So some device
level serialization is required in at least some paths.

Change-Id: I7c1e94d6965ffcc602b12fefdd3a6e97b84b5e00


[ROCm/ROCR-Runtime commit: df55cb0450]
2021-11-24 19:22:05 -06:00
Sean Keely e462118b6e Add comments to GetPcieBlit.
Comments call out the specific operation being selected since the
ternary nest is a bit hard to read.

Change-Id: If033dbaa6cba132e96196ad3fc6d5572042041f4


[ROCm/ROCR-Runtime commit: fc75731034]
2021-11-15 19:34:03 -06:00
Sean Keely 01c7c9856c Fix leak in hsa_amd_interop_map_buffer.
Agent temp array could have leaked if one of the given agent
handles was invalid.

Change-Id: I9e638b3a4f6bb917a4e3209ad81a1253bb603365


[ROCm/ROCR-Runtime commit: b198016949]
2021-11-15 19:22:20 -06:00
Sean Keely 289cc7b6b4 Correct order of argument check and default assignment in lock APIs.
Argument must be checked for nullptr before being dereferenced and
filled with the default return value.

Change-Id: I9ff366f066a5e18c78129bf59cc3ba00fca3ef18


[ROCm/ROCR-Runtime commit: f48a786662]
2021-11-15 19:22:02 -06:00
Sean Keely a7dc6d7802 Add missing return in ScopeGuard::operator=.
This omission did not cause problems earlier due to having not been
instanced.

Change-Id: I7a54f82e06c299902f3bf6b4d3737cc5e30961ad


[ROCm/ROCR-Runtime commit: 322588a60e]
2021-11-15 18:50:46 -06:00
Sean Keely c8bb2905d3 Correct node id assertion in pointer info.
Size of the node map was used as the max node id previously.  This
is wrong when RVD is used.

Change-Id: Ic632ec96891b92186e5b68cd53f81414db34f59f


[ROCm/ROCR-Runtime commit: 19454fcf26]
2021-11-10 22:09:24 -06:00
Sean Keely 0ed7eac560 Correct size of SVM node array.
Was size of the map.  Needs to be size of the node id range.

Change-Id: I92501ea7adca5c30dbb0fdabd2c421dea58f8d6f


[ROCm/ROCR-Runtime commit: c9eb85e205]
2021-11-10 21:23:42 -06:00
Sean Keely 847df17afe Include event_id in SDMA interrupt payload.
The event id assists KFD in locating the proper event associated
with the interrupt.

Change-Id: I75d58b6be74dd5b1edb0c5fe2b9d01538a649ba1


[ROCm/ROCR-Runtime commit: d65e00bcc5]
2021-11-10 20:57:11 -06:00
Jeremy Newton a056bc52dd Set License field for RPM package
This really should be set to conform to distro standards.

Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I8c3bdcc7eb103cec9db6aa9f9cfec25754784be8


[ROCm/ROCR-Runtime commit: 48e4e2c5ff]
2021-11-10 14:06:17 -05:00
Aaron Liu 8de1148504 Fix compiling error with gcc-10.3.0
On gcc-10.3.0 environment, hsa-runtime building is failed as below log:
compute/hsa/runtime/rocrtst/suites/negative/queue_validation.cc:470:18: error: conversion from ‘unsigned int’ to ‘uint16_t’ {aka ‘short unsigned int’} changes value from ‘4294967295’ to ‘65535’ [-Werror=overflow]
  470 |     aql().header |=  0xFFFFFFFF << HSA_PACKET_HEADER_TYPE;
      |     ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
make[2]: *** [CMakeFiles/rocrtst64.dir/build.make:339: CMakeFiles/rocrtst64.dir/home/aaliu/work/compute/hsa/runtime/rocrtst/suites/negative/queue_validation.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Change-Id: I95fe72030368abc211b4b97b5a7ba00b5e094730


[ROCm/ROCR-Runtime commit: f2a50c34f9]
2021-11-04 10:55:11 +08:00
Sean Keely 3cfaebf4e8 Correct rocrtst cmake.
Generates symlinks exactly once.
Admits parallelism to code object compilation.
Adds proper dependency tracking.
Adds code object files to the packages.

Change-Id: If471961906f16a2ffdc6bf5f682a4e322fb38f3e


[ROCm/ROCR-Runtime commit: 402eae11b6]
2021-10-18 11:10:50 -04:00
Sean Keely eabb4ba4b4 Correct rocrtst pool iterator.
GetGlobalMemoryPool had improper return codes for an iterator callback
and did not properly order the APU pool selection path.

Change-Id: I01ab9d23e2352be98d9718bc25889ad4f779d3ca


[ROCm/ROCR-Runtime commit: 534dc3f60c]
2021-10-16 05:02:05 -04:00
Sean Keely f5fcc610b7 Silence Clang warning.
Clang warns about bitwise operators on bools.  Cast to int silences
the warning without introducing short circut logic.

Change-Id: I6e25138e1acf4a5562d3925ea5b2fcef3addb783


[ROCm/ROCR-Runtime commit: 4b0c94cfe8]
2021-10-14 23:56:58 -05:00
Sean Keely 2a0cdb73f3 Drop -Werror.
Would be nice to get warning count changes highlighted in CI though.

Clang's increasingly suspect diagnostics has caused multiple build
breaks without highlighting any actual issues.
Also: https://embeddedartistry.com/blog/2017/05/22/werror-is-not-your-friend/

Change-Id: I7dc82da58cd86f7b4f1a9fb511c4c039419271d4


[ROCm/ROCR-Runtime commit: efeee734db]
2021-10-14 23:54:45 -05:00
Sean Keely 0116378a99 Skip inital CU mask setup unless HSA_CU_MASK is defined for the GPU.
Limits CU masking application to cases where it is explicitly requested.

Change-Id: Ib65ad0ac98f86d840c0328fa15ce40c05cd4bfae


[ROCm/ROCR-Runtime commit: 5e8d261352]
2021-10-12 20:31:56 -05:00
Freddy Paul 8d550d9cbe Cleanup symlink to header files and folders
Due to a CPACK bug the package needs to remove header file
symlinks.  Cleanup is required for uninstall and upgrade
since each release installs to a different folder.

Change-Id: I5ec378b21e69235404781c7bce3c0203eb38eed1


[ROCm/ROCR-Runtime commit: ca899ea429]
2021-10-12 14:56:02 -05:00
Sean Keely f19e8c43d2 Remove io_link workarounds.
KFD topology has been corrected and the defaults used by this
workaround are no longer true for all chips.

Change-Id: I0242d8077e9666ed1cf0dc3985244258ae5c0924


[ROCm/ROCR-Runtime commit: 19c1e92b4c]
2021-10-11 19:15:07 -05:00
Xiaomeng Hou d64a353ea0 Adjust the passing value for GPU agent when do max single allocation test
For APU asics, the default configuration size of video memory is
relatively small, plus the reserved region, ratio of max alloc size to
the pool size may below the expected value, so adjust it.

Change-Id: I798b44d9532aa6a381a1cc19faa5a46110bf0ad6


[ROCm/ROCR-Runtime commit: df59bfd57b]
2021-10-11 02:32:09 -04:00
Xiaomeng Hou a29a2e96e5 Add gfx1035 to rocrtst.
Change-Id: I276942b8badfd5ee2914e78c6c140d80d7cf4b2d


[ROCm/ROCR-Runtime commit: 9597fe3ae5]
2021-10-11 02:31:45 -04:00
Sean Keely 1d060cbb9e Correct GetSvmAttrib coherency query.
Early exit if the range is found to be fine grain.  Indeterminate
should only apply if the range is neither coarse nor fine.

Change-Id: I54133e14f4e8cfa53e2d612f6112cdcdb5a47dfa


[ROCm/ROCR-Runtime commit: a2fb1cbfbc]
2021-10-03 12:29:12 -04:00
Sean Keely 32af428b24 Fix queue leak in Enqueue Latency test.
Change-Id: I50d17fb23d772ae8b966207f4af038ca538dcbb8


[ROCm/ROCR-Runtime commit: c9440e7b11]
2021-10-03 12:27:49 -04:00
Sean Keely 94352f3e24 Close KFD when failing due to debugger state.
Change-Id: I6a6890fd9e86d27f87ae96de1c47c89d40a4e010


[ROCm/ROCR-Runtime commit: 234ef77e32]
2021-10-03 12:27:49 -04:00
Sean Keely e0ebcb9cc3 Workaround gfx90a SDMA0 quirk.
Because of sharing ports with other engines, the
hardware design team has advised that SDMA0 on gfx90a
should only be used for host-to-device data transfers.
The recommendation is to use SDMA1 for any device-to-device
or device-to-host data transfers.

A driver change will ensure that, for each gfx90a
device, only the first PCIe SDMA queue a process
requests will possibly be from SDMA0. This patch ensures
that the first PCIe queue requested (which may be from
SDMA0) is always set up for host-to-device.

Change-Id: I6793ca95596dedaed9d5be1dbd9469ceef2a5c33


[ROCm/ROCR-Runtime commit: 280a458d0c]
2021-09-30 05:53:49 -04:00
Sean Keely 3e062a0424 Correct typo in package dependencies.
Change-Id: I3c378479ceb822e55168517e041a48fa8a2d3d98


[ROCm/ROCR-Runtime commit: e0224ad89f]
2021-09-20 21:00:15 -05:00
Sean Keely 7226c26b43 Correct Clang version detection and support for multiple prefix paths.
Bumps cmake minimum version to 3.7 for version comparison operator.

Previously the Clang cmake project version strings were used.  These
are not defined if the clang cmake project has not been loaded.
We should use CMAKE_CXX_COMPILER_VERSION to check the version when
only the compiler binary is redirected and the project files are
not available.

Also adjust device libs lookup logic to handle multiple paths in
CMAKE_PREFIX_PATH.

Change-Id: I67b6958d8241685cd6c3a0af68507c9fdc6331ef


[ROCm/ROCR-Runtime commit: 2e9a9f7c7a]
2021-09-20 19:23:04 -05:00
Sean Keely 10a8c8b556 Add debug option to skip setting the initial cu mask.
Adds debug variable HSA_CU_MASK_SKIP_INIT.

Change-Id: I5c742d1184a36fdef818bc50c3b780b859b68560


[ROCm/ROCR-Runtime commit: a8c3ea82a4]
2021-09-16 23:43:49 -05:00
Sean Keely a02b318a7e Correct fast f16 capability reporting.
Was hard coded to false.  Updated to reflect f16 availablity since
gfx8.

Change-Id: I7d5b9792c8e0163199c421a61b5d49b25cd98645


[ROCm/ROCR-Runtime commit: 5535b1f86f]
2021-09-16 21:15:52 -05:00
Oak Zeng f590465aa3 Add gfx1013 support
Change-Id: I7122caea3ef2254b50bde25ec545116685452116
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>


[ROCm/ROCR-Runtime commit: 80206af91e]
2021-09-15 01:10:20 -04:00
Sean Keely b1fb887b40 Add gfx1013 to rocrtst.
Change-Id: I49c4aafc661d7c2ba6fd6baa006bf698b0af5274


[ROCm/ROCR-Runtime commit: 8d789461bf]
2021-09-14 22:55:43 -04:00
Sean Keely 2537a8b1c2 Place GPU local resources in nearest NUMA node.
For minimal latency we should place command queues and blit code
in the nearest numa node to each GPU.  Add an allocator matching
the current runtime default allocator interface to each GpuAgent
that allocates on the closest numa node as represented by kfd
topology.  Use this allocator for queue ring buffers and blit
objects.

Change-Id: I181127f9c27bafe68976312963146616e3f58369


[ROCm/ROCR-Runtime commit: 5af558f739]
2021-09-14 17:49:24 -04:00
Sean Keely f6dcfa4246 Register the default queue error handler for all internal queues.
Also make failure to handle queue errors fatal.

Motivation is to improve detection of queue error conditions
that currently appear as application hangs.

Change-Id: I655643616dc0bd303d7df3ce8aca2c099bec3d46


[ROCm/ROCR-Runtime commit: 907679c989]
2021-08-27 20:11:58 -04:00
Sean Keely 2692cac562 Add check_required_components to end of cmake package file.
Sets package found and component lists.  ROCr does not have components
so this is mostly cosmetic.  It's part of maintaining a compliant
cmake project config file though.

Change-Id: Ida2ef746375143babd3a6f938727a47135606f01


[ROCm/ROCR-Runtime commit: 9e7d4629ca]
2021-08-27 20:07:26 -04:00
Sean Keely 9870e8b576 Correct queue exception hander termination.
Set handler state to terminated before exiting.
Also simplify scratch handler exit loop.

Change-Id: I0a80c8a1899e8b60a6e7aa6989ba28de42ba31e7


[ROCm/ROCR-Runtime commit: e06dd39d89]
2021-08-27 20:06:44 -04:00
Sean Keely bd71510eaa Minor spelling fix in comments.
Change-Id: Ia99ac3f75444675be48b3d965552fab79da37c92


[ROCm/ROCR-Runtime commit: 76e6ff0411]
2021-08-27 20:06:17 -04:00
Sean Keely 0ad218cbab Build fix for older clang compilers.
Per clang 13 option -Wno-error=unused-but-set-variable is not
recoginized nor is the diagnostic emitted.  Set this option
conditional to the clang compiler version.

Change-Id: I3c0958dffa985d53b641f9eff4e702988dffd033


[ROCm/ROCR-Runtime commit: eec545fd9f]
2021-08-27 20:06:10 -04:00
Sean Keely 07103a6b97 Allow passing zero bits into hsa_amd_queue_cu_set_mask.
Passing 0 into num_cu_mask_count used to be an implicit error.
This has been repurposed as a short hand for enabling all CUs.
Enabling all CUs when HSA_CU_MASK is set will cause the CU mask to
reset to whatever was set by HSA_CU_MASK which may then be queried.

Change-Id: I1d6bb2034595a78ee48fa72aa05563e8ea6c0fff


[ROCm/ROCR-Runtime commit: 02666ec0f1]
2021-08-27 20:05:24 -04:00
Sean Keely dbba14f823 Improve HSA_CU_MASK parsing efficiency.
Delay parsing until after GPU discovery.  Use the surfaced
GPU count and maximum phyiscal CU count to limit parsed bit masks.

This prevents pathological input such as
HSA_CU_MASK=0-8000000:0-8000000 from attempting to consume 7TiB.

Change-Id: I3773d2db3740c2023b0f6275d1818b69119b0495


[ROCm/ROCR-Runtime commit: 2aa0795b33]
2021-08-27 20:05:18 -04:00
Sean Keely e615e35e0e Add emulator build notification to rocrtst.
Change-Id: I3eb5fd5ec26541f3459aebf289d25c942f09da02


[ROCm/ROCR-Runtime commit: 7512c32f69]
2021-08-13 16:46:26 -05:00
Sean Keely e4b3eb87e2 Minor interface improvement to pointer info.
Take in const void* rather than void*.  This does not break the
abi or existing code.  Existing code would need to cast away any
const which is unnecessary and annoying.

Change-Id: I28787e8fab1b600bf6871ea82835e10a4f475c5b


[ROCm/ROCR-Runtime commit: 270d042ef8]
2021-08-04 16:43:23 -04:00
Sean Keely 95bf93f650 Update README and increment minor version number.
APIs have been added and minimum cmake version number has been
changed.

Change-Id: Ic75849e6937c04faed2d206344df2cf9e9a78016


[ROCm/ROCR-Runtime commit: ed7eec14e1]
2021-07-30 17:13:17 -05:00