Early exit if the range is found to be fine grain. Indeterminate
should only apply if the range is neither coarse nor fine.
Change-Id: I54133e14f4e8cfa53e2d612f6112cdcdb5a47dfa
Because of sharing ports with other engines, the
hardware design team has advised that SDMA0 on gfx90a
should only be used for host-to-device data transfers.
The recommendation is to use SDMA1 for any device-to-device
or device-to-host data transfers.
A driver change will ensure that, for each gfx90a
device, only the first PCIe SDMA queue a process
requests will possibly be from SDMA0. This patch ensures
that the first PCIe queue requested (which may be from
SDMA0) is always set up for host-to-device.
Change-Id: I6793ca95596dedaed9d5be1dbd9469ceef2a5c33
Bumps cmake minimum version to 3.7 for version comparison operator.
Previously the Clang cmake project version strings were used. These
are not defined if the clang cmake project has not been loaded.
We should use CMAKE_CXX_COMPILER_VERSION to check the version when
only the compiler binary is redirected and the project files are
not available.
Also adjust device libs lookup logic to handle multiple paths in
CMAKE_PREFIX_PATH.
Change-Id: I67b6958d8241685cd6c3a0af68507c9fdc6331ef
For minimal latency we should place command queues and blit code
in the nearest numa node to each GPU. Add an allocator matching
the current runtime default allocator interface to each GpuAgent
that allocates on the closest numa node as represented by kfd
topology. Use this allocator for queue ring buffers and blit
objects.
Change-Id: I181127f9c27bafe68976312963146616e3f58369
Also make failure to handle queue errors fatal.
Motivation is to improve detection of queue error conditions
that currently appear as application hangs.
Change-Id: I655643616dc0bd303d7df3ce8aca2c099bec3d46
Sets package found and component lists. ROCr does not have components
so this is mostly cosmetic. It's part of maintaining a compliant
cmake project config file though.
Change-Id: Ida2ef746375143babd3a6f938727a47135606f01
Per clang 13 option -Wno-error=unused-but-set-variable is not
recoginized nor is the diagnostic emitted. Set this option
conditional to the clang compiler version.
Change-Id: I3c0958dffa985d53b641f9eff4e702988dffd033
Passing 0 into num_cu_mask_count used to be an implicit error.
This has been repurposed as a short hand for enabling all CUs.
Enabling all CUs when HSA_CU_MASK is set will cause the CU mask to
reset to whatever was set by HSA_CU_MASK which may then be queried.
Change-Id: I1d6bb2034595a78ee48fa72aa05563e8ea6c0fff
Delay parsing until after GPU discovery. Use the surfaced
GPU count and maximum phyiscal CU count to limit parsed bit masks.
This prevents pathological input such as
HSA_CU_MASK=0-8000000:0-8000000 from attempting to consume 7TiB.
Change-Id: I3773d2db3740c2023b0f6275d1818b69119b0495
Take in const void* rather than void*. This does not break the
abi or existing code. Existing code would need to cast away any
const which is unnecessary and annoying.
Change-Id: I28787e8fab1b600bf6871ea82835e10a4f475c5b
Branches are unused and emit noise to the console when running
commands for which we have no actions.
Change-Id: I1f8c49a20bd7f529172721f35d29665cfc8dc6a4
Some strings were missing the human readable form of the error code.
Also unifying source formatting via clang-format.
Change-Id: I0bcc2ab77dda476904c684cc2c584a5c7e8230d4
global_flags reporting allows discovery of an allocation's memory
model (coarse, fine, kernarg). This is critical on gfx90a and
also allows discovery of the memory model of IPC imports.
Change-Id: Icbc3c243ca20e264af5e1931becd2419f762c7ad
Previously ranges were reported as fine if and only if they were
entirely fine. Coarse and mixed ranges were reported as coarse.
For gfx90a it is critical to know if a range is coarse or fine as
fp atomics targeting fine do not function. Range queried reporting
coarse must be able to be trusted so must only report coarse if the
entire region is coarse.
Change-Id: I29c654a2afcd6943961eb2455e3654dfdb1283b5
New environment variable HSA_CU_MASK allows users to
specify a cu mask to every queue allocated from any
GPU. hsa_amd_queue_cu_set_mask is restricted from
escaping this mask.
A new API hsa_amd_queue_cu_get_mask is added to query
the current cu mask.
Change-Id: I846c03a5faaca9b95067c31db84b59cc9fce2f03
Some distros do not provide the proper hwloc version for rocrtst.
This packages the required version.
Change-Id: Iebc68250c33f309d6b50e850a0553685bac50563
Correct deb and rpm package conflict declarations.
hsa-ext-rocr-dev was to be replaced. Now that two packages
replace this package remove conflicts so that they do not block
eachother.
Change-Id: If25ea6cfd3d6d00398fd0a8d179860d3a92dc907
Conform with normal packaging behavior where a binary
and its development headers are in separate packages.
Change-Id: I91c58ea271a8e1c710c213060bca6d58d69287e6
Preparation for splitting the package. rocm-dev meta package
should be updated after this is merged and before splitting the
packages to avoid build breaks.
Change-Id: Iaad54ee72207285eaaa99e88cf1949bea7f29001
Under xnack we can now identify the queue which generated a vm fault.
This allows users to identify which queue, and therefore which
dispatch, a vm fault came from.
Change-Id: If72ff3de05800f2b811aa7842a15eedff8b5e45a
ttmp6.packet_index is reported as 0 for all waves, regardless of the
dispatch packet position in the queue, due to an issue in the clearing
of the previous trap_id and saved status.halt bit.
Fixed TTMP6_SAVED_STATUS_HALT_MASK to only be one bit, 1<<29.
Change-Id: Ia4934e51123a40d71de658efc387a1f3a6344f05
If left non-zero the event loop will keep reinvoking the callback,
preventing AqlQueue::ExceptionHandler from running.
Change-Id: If85fbaf62f04ffd327ecf9d649aa23afad4442ce
Certain special signals do not carry their updates via their signal
value. These signals are wrappers around special KFD events, of
which the only current instance informs about VM faults. We either
need to check each signal for this special event type or rely on
the checking done in hsa_amd_signal_wait_any. Since there will always
be a small number of these signals it doesn't make much since to
penalize the performance path with this check. Additionally we know
that the signal indicated by hsa_amd_signal_wait_any is satisfied so
don't need to recheck it's conditions.
Change-Id: I9fc6298300ad543d823ecd28ca8fab4ad26c23ef
Clang now warns about set but unused variables. It also now
recognizes -Wno-error=unused-but-set-variable so this patch moves
that option back to the general options list.
Change-Id: Id800e87eb688b9441b14380e2246ad586179f31a
Allows determining if the host can directly access HMM memory that
is physically resident in vram.
Change-Id: Ie452eedd0e27fe1b511afd416f5a1cd01b3d84e8
Enables the fragment allocator to handle >2MB allocations, maintaining
good TLB alignment. Prior code contained a bug that caused the effective
API granule for vram allocations >2MB to be bumped to 2MB.
Also adjusts the block cache's block retention heuristic to not
count discarded blocks as in use. This will reduce block retention
when a significant amount of large blocks or IPC is in use.
Change-Id: I30bd85eb87951df822211f799d9cfe579ab109c6