Branches are unused and emit noise to the console when running
commands for which we have no actions.
Change-Id: I1f8c49a20bd7f529172721f35d29665cfc8dc6a4
Some strings were missing the human readable form of the error code.
Also unifying source formatting via clang-format.
Change-Id: I0bcc2ab77dda476904c684cc2c584a5c7e8230d4
global_flags reporting allows discovery of an allocation's memory
model (coarse, fine, kernarg). This is critical on gfx90a and
also allows discovery of the memory model of IPC imports.
Change-Id: Icbc3c243ca20e264af5e1931becd2419f762c7ad
Previously ranges were reported as fine if and only if they were
entirely fine. Coarse and mixed ranges were reported as coarse.
For gfx90a it is critical to know if a range is coarse or fine as
fp atomics targeting fine do not function. Range queried reporting
coarse must be able to be trusted so must only report coarse if the
entire region is coarse.
Change-Id: I29c654a2afcd6943961eb2455e3654dfdb1283b5
New environment variable HSA_CU_MASK allows users to
specify a cu mask to every queue allocated from any
GPU. hsa_amd_queue_cu_set_mask is restricted from
escaping this mask.
A new API hsa_amd_queue_cu_get_mask is added to query
the current cu mask.
Change-Id: I846c03a5faaca9b95067c31db84b59cc9fce2f03
Correct deb and rpm package conflict declarations.
hsa-ext-rocr-dev was to be replaced. Now that two packages
replace this package remove conflicts so that they do not block
eachother.
Change-Id: If25ea6cfd3d6d00398fd0a8d179860d3a92dc907
Conform with normal packaging behavior where a binary
and its development headers are in separate packages.
Change-Id: I91c58ea271a8e1c710c213060bca6d58d69287e6
Preparation for splitting the package. rocm-dev meta package
should be updated after this is merged and before splitting the
packages to avoid build breaks.
Change-Id: Iaad54ee72207285eaaa99e88cf1949bea7f29001
Under xnack we can now identify the queue which generated a vm fault.
This allows users to identify which queue, and therefore which
dispatch, a vm fault came from.
Change-Id: If72ff3de05800f2b811aa7842a15eedff8b5e45a
ttmp6.packet_index is reported as 0 for all waves, regardless of the
dispatch packet position in the queue, due to an issue in the clearing
of the previous trap_id and saved status.halt bit.
Fixed TTMP6_SAVED_STATUS_HALT_MASK to only be one bit, 1<<29.
Change-Id: Ia4934e51123a40d71de658efc387a1f3a6344f05
If left non-zero the event loop will keep reinvoking the callback,
preventing AqlQueue::ExceptionHandler from running.
Change-Id: If85fbaf62f04ffd327ecf9d649aa23afad4442ce
Certain special signals do not carry their updates via their signal
value. These signals are wrappers around special KFD events, of
which the only current instance informs about VM faults. We either
need to check each signal for this special event type or rely on
the checking done in hsa_amd_signal_wait_any. Since there will always
be a small number of these signals it doesn't make much since to
penalize the performance path with this check. Additionally we know
that the signal indicated by hsa_amd_signal_wait_any is satisfied so
don't need to recheck it's conditions.
Change-Id: I9fc6298300ad543d823ecd28ca8fab4ad26c23ef
Clang now warns about set but unused variables. It also now
recognizes -Wno-error=unused-but-set-variable so this patch moves
that option back to the general options list.
Change-Id: Id800e87eb688b9441b14380e2246ad586179f31a
Allows determining if the host can directly access HMM memory that
is physically resident in vram.
Change-Id: Ie452eedd0e27fe1b511afd416f5a1cd01b3d84e8
Enables the fragment allocator to handle >2MB allocations, maintaining
good TLB alignment. Prior code contained a bug that caused the effective
API granule for vram allocations >2MB to be bumped to 2MB.
Also adjusts the block cache's block retention heuristic to not
count discarded blocks as in use. This will reduce block retention
when a significant amount of large blocks or IPC is in use.
Change-Id: I30bd85eb87951df822211f799d9cfe579ab109c6
Under high async handler load signal retention and event sorting
become bottlenecks. This change processes more handlers in a
single pass to amortize wait_any overheads.
Change-Id: I8b276e102db647e3858e120547aa0c6fca85ab4c
Old memory properties info name used after removing branches.
This caused the CPU coarse grain pool to initialize with random
bits.
Change-Id: I397bc5ecf09fab69bdf1d7fafadcf54d71b64070
Prevents poorly written tools which throw in tools interface
callbacks from causing ROCr to catch and return a generic error
code.
Change-Id: I2f5bf7104dc7d4ee688eb48423c7ffdb06bd7702
Old logic did not consider memory held in the scratch cache to be
free when deciding whether or not to reclaim.
Change-Id: I7f7c7549c72d743edbf7c53489fe9a453dc4177a
Clarify behavior of hsa_ven_amd_loader_iterate_executables during
concurrent calls of executable creation and destruction.
Change-Id: Idc3e3981d4fcc0d58d9f1b7a7578deed20aa490b