Граф коммитов

780 Коммитов

Автор SHA1 Сообщение Дата
Sean Keely a2fb1cbfbc Correct GetSvmAttrib coherency query.
Early exit if the range is found to be fine grain.  Indeterminate
should only apply if the range is neither coarse nor fine.

Change-Id: I54133e14f4e8cfa53e2d612f6112cdcdb5a47dfa
2021-10-03 12:29:12 -04:00
Sean Keely c9440e7b11 Fix queue leak in Enqueue Latency test.
Change-Id: I50d17fb23d772ae8b966207f4af038ca538dcbb8
2021-10-03 12:27:49 -04:00
Sean Keely 234ef77e32 Close KFD when failing due to debugger state.
Change-Id: I6a6890fd9e86d27f87ae96de1c47c89d40a4e010
2021-10-03 12:27:49 -04:00
Sean Keely 280a458d0c Workaround gfx90a SDMA0 quirk.
Because of sharing ports with other engines, the
hardware design team has advised that SDMA0 on gfx90a
should only be used for host-to-device data transfers.
The recommendation is to use SDMA1 for any device-to-device
or device-to-host data transfers.

A driver change will ensure that, for each gfx90a
device, only the first PCIe SDMA queue a process
requests will possibly be from SDMA0. This patch ensures
that the first PCIe queue requested (which may be from
SDMA0) is always set up for host-to-device.

Change-Id: I6793ca95596dedaed9d5be1dbd9469ceef2a5c33
2021-09-30 05:53:49 -04:00
Sean Keely e0224ad89f Correct typo in package dependencies.
Change-Id: I3c378479ceb822e55168517e041a48fa8a2d3d98
2021-09-20 21:00:15 -05:00
Sean Keely 2e9a9f7c7a Correct Clang version detection and support for multiple prefix paths.
Bumps cmake minimum version to 3.7 for version comparison operator.

Previously the Clang cmake project version strings were used.  These
are not defined if the clang cmake project has not been loaded.
We should use CMAKE_CXX_COMPILER_VERSION to check the version when
only the compiler binary is redirected and the project files are
not available.

Also adjust device libs lookup logic to handle multiple paths in
CMAKE_PREFIX_PATH.

Change-Id: I67b6958d8241685cd6c3a0af68507c9fdc6331ef
2021-09-20 19:23:04 -05:00
Sean Keely a8c3ea82a4 Add debug option to skip setting the initial cu mask.
Adds debug variable HSA_CU_MASK_SKIP_INIT.

Change-Id: I5c742d1184a36fdef818bc50c3b780b859b68560
2021-09-16 23:43:49 -05:00
Sean Keely 5535b1f86f Correct fast f16 capability reporting.
Was hard coded to false.  Updated to reflect f16 availablity since
gfx8.

Change-Id: I7d5b9792c8e0163199c421a61b5d49b25cd98645
2021-09-16 21:15:52 -05:00
Oak Zeng 80206af91e Add gfx1013 support
Change-Id: I7122caea3ef2254b50bde25ec545116685452116
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2021-09-15 01:10:20 -04:00
Sean Keely 8d789461bf Add gfx1013 to rocrtst.
Change-Id: I49c4aafc661d7c2ba6fd6baa006bf698b0af5274
2021-09-14 22:55:43 -04:00
Sean Keely 5af558f739 Place GPU local resources in nearest NUMA node.
For minimal latency we should place command queues and blit code
in the nearest numa node to each GPU.  Add an allocator matching
the current runtime default allocator interface to each GpuAgent
that allocates on the closest numa node as represented by kfd
topology.  Use this allocator for queue ring buffers and blit
objects.

Change-Id: I181127f9c27bafe68976312963146616e3f58369
2021-09-14 17:49:24 -04:00
Sean Keely 907679c989 Register the default queue error handler for all internal queues.
Also make failure to handle queue errors fatal.

Motivation is to improve detection of queue error conditions
that currently appear as application hangs.

Change-Id: I655643616dc0bd303d7df3ce8aca2c099bec3d46
2021-08-27 20:11:58 -04:00
Sean Keely 9e7d4629ca Add check_required_components to end of cmake package file.
Sets package found and component lists.  ROCr does not have components
so this is mostly cosmetic.  It's part of maintaining a compliant
cmake project config file though.

Change-Id: Ida2ef746375143babd3a6f938727a47135606f01
2021-08-27 20:07:26 -04:00
Sean Keely e06dd39d89 Correct queue exception hander termination.
Set handler state to terminated before exiting.
Also simplify scratch handler exit loop.

Change-Id: I0a80c8a1899e8b60a6e7aa6989ba28de42ba31e7
2021-08-27 20:06:44 -04:00
Sean Keely 76e6ff0411 Minor spelling fix in comments.
Change-Id: Ia99ac3f75444675be48b3d965552fab79da37c92
2021-08-27 20:06:17 -04:00
Sean Keely eec545fd9f Build fix for older clang compilers.
Per clang 13 option -Wno-error=unused-but-set-variable is not
recoginized nor is the diagnostic emitted.  Set this option
conditional to the clang compiler version.

Change-Id: I3c0958dffa985d53b641f9eff4e702988dffd033
2021-08-27 20:06:10 -04:00
Sean Keely 02666ec0f1 Allow passing zero bits into hsa_amd_queue_cu_set_mask.
Passing 0 into num_cu_mask_count used to be an implicit error.
This has been repurposed as a short hand for enabling all CUs.
Enabling all CUs when HSA_CU_MASK is set will cause the CU mask to
reset to whatever was set by HSA_CU_MASK which may then be queried.

Change-Id: I1d6bb2034595a78ee48fa72aa05563e8ea6c0fff
2021-08-27 20:05:24 -04:00
Sean Keely 2aa0795b33 Improve HSA_CU_MASK parsing efficiency.
Delay parsing until after GPU discovery.  Use the surfaced
GPU count and maximum phyiscal CU count to limit parsed bit masks.

This prevents pathological input such as
HSA_CU_MASK=0-8000000:0-8000000 from attempting to consume 7TiB.

Change-Id: I3773d2db3740c2023b0f6275d1818b69119b0495
2021-08-27 20:05:18 -04:00
Sean Keely 7512c32f69 Add emulator build notification to rocrtst.
Change-Id: I3eb5fd5ec26541f3459aebf289d25c942f09da02
2021-08-13 16:46:26 -05:00
Sean Keely 270d042ef8 Minor interface improvement to pointer info.
Take in const void* rather than void*.  This does not break the
abi or existing code.  Existing code would need to cast away any
const which is unnecessary and annoying.

Change-Id: I28787e8fab1b600bf6871ea82835e10a4f475c5b
2021-08-04 16:43:23 -04:00
Sean Keely ed7eec14e1 Update README and increment minor version number.
APIs have been added and minimum cmake version number has been
changed.

Change-Id: Ic75849e6937c04faed2d206344df2cf9e9a78016
2021-07-30 17:13:17 -05:00
Sean Keely 4907d4577b Add optional dependency on rocm-core.
Part of uninstall sanity changes.

Change-Id: I29f16470deb87e67050339f10bfb7cc1b5f9c1b2
2021-07-30 16:39:37 -04:00
Sean Keely 471c1859ab Simplify RPM package building.
The need to run rpm outside of cpack seems to have passed.

Change-Id: I8d7d992e289a0a88fa11b57bf0401bc6740c266b
2021-07-30 16:39:32 -04:00
Sean Keely e8439cca08 Remove unused branches from DEB packaging scripts.
Branches are unused and emit noise to the console when running
commands for which we have no actions.

Change-Id: I1f8c49a20bd7f529172721f35d29665cfc8dc6a4
2021-07-30 16:39:25 -04:00
Sean Keely 081ab00f8e Correct hsa_status_string strings.
Some strings were missing the human readable form of the error code.
Also unifying source formatting via clang-format.

Change-Id: I0bcc2ab77dda476904c684cc2c584a5c7e8230d4
2021-07-30 16:30:31 -04:00
Sean Keely 62b7c0ed3b Add missing HSA_STATUS_ERROR_INVALID_MEMORY_POOL string.
HSA_STATUS_ERROR_INVALID_MEMORY_POOL was missing from
hsa_status_string.

Change-Id: I9a9121d54a61f966d87081a55638397473bddbe4
2021-07-30 16:30:25 -04:00
Sean Keely a0069904c8 Minor correction to debug messages.
Added missing \n.

Change-Id: I6e17459390c2c18819fc1decd8a6c91b7d7409cf
2021-07-30 16:30:18 -04:00
Sean Keely 0778969e89 Correct single character transcription errors in license text.
Somehow "and/or" was rendered as "and#or".

Change-Id: Ia8219e0241cd1c788e26a92b491523852e9a2f40
2021-07-30 16:28:53 -04:00
Sean Keely bb4dfbba1e Correct data race in GpuAgent::GetXgmiBlit.
Threads may race against xgmi_peer_list_ when dynamically assigning
peers to sdma engines.

Change-Id: I300c10f0cfa0ff7d6a5515364070a0895e2f4644
2021-07-30 16:10:59 -04:00
Sean Keely e3a01690a5 Add global_flags reporting to pointer info.
global_flags reporting allows discovery of an allocation's memory
model (coarse, fine, kernarg).  This is critical on gfx90a and
also allows discovery of the memory model of IPC imports.

Change-Id: Icbc3c243ca20e264af5e1931becd2419f762c7ad
2021-07-29 15:37:47 -05:00
Sean Keely e6e66e8a05 Report SVM range queries with both coarse and fine grain as indeterminate.
Previously ranges were reported as fine if and only if they were
entirely fine.  Coarse and mixed ranges were reported as coarse.
For gfx90a it is critical to know if a range is coarse or fine as
fp atomics targeting fine do not function.  Range queried reporting
coarse must be able to be trusted so must only report coarse if the
entire region is coarse.

Change-Id: I29c654a2afcd6943961eb2455e3654dfdb1283b5
2021-07-29 15:34:58 -05:00
Sean Keely 4455250be1 Add HSA_CU_MASK
New environment variable HSA_CU_MASK allows users to
specify a cu mask to every queue allocated from any
GPU.  hsa_amd_queue_cu_set_mask is restricted from
escaping this mask.

A new API hsa_amd_queue_cu_get_mask is added to query
the current cu mask.

Change-Id: I846c03a5faaca9b95067c31db84b59cc9fce2f03
2021-07-29 02:23:34 -05:00
Sean Keely 2c35469617 Provide hwloc dependency.
Some distros do not provide the proper hwloc version for rocrtst.
This packages the required version.

Change-Id: Iebc68250c33f309d6b50e850a0553685bac50563
2021-07-26 23:56:14 -04:00
Sean Keely 770a42cb42 Revert "Revert "Split packaging into binary and dev packages.""
Correct deb and rpm package conflict declarations.
hsa-ext-rocr-dev was to be replaced.  Now that two packages
replace this package remove conflicts so that they do not block
eachother.

Change-Id: If25ea6cfd3d6d00398fd0a8d179860d3a92dc907
2021-07-26 20:42:25 -05:00
Sean Keely d2ccf44085 Revert "Split packaging into binary and dev packages."
This reverts commit 2c32cbea00.

Change-Id: I33cbcffe5695c4e45ebce37ce56177006a5e0f62
2021-07-26 19:23:46 -05:00
Sean Keely 2c32cbea00 Split packaging into binary and dev packages.
Conform with normal packaging behavior where a binary
and its development headers are in separate packages.

Change-Id: I91c58ea271a8e1c710c213060bca6d58d69287e6
2021-07-26 17:01:36 -05:00
Sean Keely bea17130f7 Add package splitting names to PROVIDES.
Preparation for splitting the package.  rocm-dev meta package
should be updated after this is merged and before splitting the
packages to avoid build breaks.

Change-Id: Iaad54ee72207285eaaa99e88cf1949bea7f29001
2021-07-23 18:33:09 -05:00
Aaron Liu 4032070c3e Add gfx1035 for yellow carp
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Change-Id: I1e3e44352b5825fc0f249c39aed703d4990995ca
2021-07-22 13:48:31 +08:00
Sean Keely 59ee761f81 Add support for reporting vm faults through the queue error handler.
Under xnack we can now identify the queue which generated a vm fault.
This allows users to identify which queue, and therefore which
dispatch, a vm fault came from.

Change-Id: If72ff3de05800f2b811aa7842a15eedff8b5e45a
2021-07-16 18:03:26 -05:00
Laurent Morichetti ef1955ad42 Fix incorrect packet index in ttmp6
ttmp6.packet_index is reported as 0 for all waves, regardless of the
dispatch packet position in the queue, due to an issue in the clearing
of the previous trap_id and saved status.halt bit.

Fixed TTMP6_SAVED_STATUS_HALT_MASK to only be one bit, 1<<29.

Change-Id: Ia4934e51123a40d71de658efc387a1f3a6344f05
2021-07-16 18:03:26 -05:00
Jay Cornwall f3d942b67f Report union of wave errors as a bitmask in trap handler
Also fix incorrect PC increment on host trap.

Change-Id: Ic8bbf2b90f9f879ba62b558b909d010a8939a663
2021-07-16 18:03:26 -05:00
Jay Cornwall 8d4608ed0e Clear queue error code when not handling exceptions
If left non-zero the event loop will keep reinvoking the callback,
preventing AqlQueue::ExceptionHandler from running.

Change-Id: If85fbaf62f04ffd327ecf9d649aa23afad4442ce
2021-07-16 18:03:26 -05:00
Jay Cornwall 7e4088309d Add new trap handler, bump debug API version
Also fix hsaKmtRuntimeEnable error handling. Continue if ioctl fails.

Change-Id: I754ccba5910ccfef6f1ada1415593ef89ce33aba
2021-07-16 18:03:26 -05:00
Sean Keely 0159aea4c9 Initialize new exception handler state.
Change-Id: Ibcb699760837b9ec1508d6af948a272a81ddcd02
2021-07-16 18:03:26 -05:00
Sean Keely 206e87d28b Support debugging hw exceptions.
Change-Id: I9780147294af2e9457fa54693580735452ee2ae6
2021-07-16 18:03:26 -05:00
Sean Keely 3d6a18b67c Always execute the first satisfied async signal handler.
Certain special signals do not carry their updates via their signal
value.  These signals are wrappers around special KFD events, of
which the only current instance informs about VM faults.  We either
need to check each signal for this special event type or rely on
the checking done in hsa_amd_signal_wait_any.  Since there will always
be a small number of these signals it doesn't make much since to
penalize the performance path with this check.  Additionally we know
that the signal indicated by hsa_amd_signal_wait_any is satisfied so
don't need to recheck it's conditions.

Change-Id: I9fc6298300ad543d823ecd28ca8fab4ad26c23ef
2021-06-24 02:45:31 -05:00
Sean Keely 26808295f8 Correct clang build error.
Clang now warns about set but unused variables.  It also now
recognizes -Wno-error=unused-but-set-variable so this patch moves
that option back to the general options list.

Change-Id: Id800e87eb688b9441b14380e2246ad586179f31a
2021-06-23 15:04:58 -05:00
Sean Keely 74bcd6ee90 Locate kernel directory from device name.
Search child directories when locating device code.

Change-Id: I51515f002ad60878a2be0b6e9ee6416c67a1d799
2021-06-17 22:57:21 -04:00
Sean Keely 9e53cab613 Add agent info query for HSA_AMD_AGENT_INFO_SVM_DIRECT_HOST_ACCESS.
Allows determining if the host can directly access HMM memory that
is physically resident in vram.

Change-Id: Ie452eedd0e27fe1b511afd416f5a1cd01b3d84e8
2021-06-17 03:45:26 -04:00
Sean Keely 8adbda1c18 Allocate any size vram request through the fragment allocator.
Enables the fragment allocator to handle >2MB allocations, maintaining
good TLB alignment.  Prior code contained a bug that caused the effective
API granule for vram allocations >2MB to be bumped to 2MB.

Also adjusts the block cache's block retention heuristic to not
count discarded blocks as in use.  This will reduce block retention
when a significant amount of large blocks or IPC is in use.

Change-Id: I30bd85eb87951df822211f799d9cfe579ab109c6
2021-06-10 19:30:54 -05:00