New environment variable HSA_CU_MASK allows users to
specify a cu mask to every queue allocated from any
GPU. hsa_amd_queue_cu_set_mask is restricted from
escaping this mask.
A new API hsa_amd_queue_cu_get_mask is added to query
the current cu mask.
Change-Id: I846c03a5faaca9b95067c31db84b59cc9fce2f03
[ROCm/ROCR-Runtime commit: 4455250be1]
Some distros do not provide the proper hwloc version for rocrtst.
This packages the required version.
Change-Id: Iebc68250c33f309d6b50e850a0553685bac50563
[ROCm/ROCR-Runtime commit: 2c35469617]
Correct deb and rpm package conflict declarations.
hsa-ext-rocr-dev was to be replaced. Now that two packages
replace this package remove conflicts so that they do not block
eachother.
Change-Id: If25ea6cfd3d6d00398fd0a8d179860d3a92dc907
[ROCm/ROCR-Runtime commit: 770a42cb42]
Conform with normal packaging behavior where a binary
and its development headers are in separate packages.
Change-Id: I91c58ea271a8e1c710c213060bca6d58d69287e6
[ROCm/ROCR-Runtime commit: 2c32cbea00]
Preparation for splitting the package. rocm-dev meta package
should be updated after this is merged and before splitting the
packages to avoid build breaks.
Change-Id: Iaad54ee72207285eaaa99e88cf1949bea7f29001
[ROCm/ROCR-Runtime commit: bea17130f7]
Under xnack we can now identify the queue which generated a vm fault.
This allows users to identify which queue, and therefore which
dispatch, a vm fault came from.
Change-Id: If72ff3de05800f2b811aa7842a15eedff8b5e45a
[ROCm/ROCR-Runtime commit: 59ee761f81]
ttmp6.packet_index is reported as 0 for all waves, regardless of the
dispatch packet position in the queue, due to an issue in the clearing
of the previous trap_id and saved status.halt bit.
Fixed TTMP6_SAVED_STATUS_HALT_MASK to only be one bit, 1<<29.
Change-Id: Ia4934e51123a40d71de658efc387a1f3a6344f05
[ROCm/ROCR-Runtime commit: ef1955ad42]
If left non-zero the event loop will keep reinvoking the callback,
preventing AqlQueue::ExceptionHandler from running.
Change-Id: If85fbaf62f04ffd327ecf9d649aa23afad4442ce
[ROCm/ROCR-Runtime commit: 8d4608ed0e]
Also fix hsaKmtRuntimeEnable error handling. Continue if ioctl fails.
Change-Id: I754ccba5910ccfef6f1ada1415593ef89ce33aba
[ROCm/ROCR-Runtime commit: 7e4088309d]
Certain special signals do not carry their updates via their signal
value. These signals are wrappers around special KFD events, of
which the only current instance informs about VM faults. We either
need to check each signal for this special event type or rely on
the checking done in hsa_amd_signal_wait_any. Since there will always
be a small number of these signals it doesn't make much since to
penalize the performance path with this check. Additionally we know
that the signal indicated by hsa_amd_signal_wait_any is satisfied so
don't need to recheck it's conditions.
Change-Id: I9fc6298300ad543d823ecd28ca8fab4ad26c23ef
[ROCm/ROCR-Runtime commit: 3d6a18b67c]
Clang now warns about set but unused variables. It also now
recognizes -Wno-error=unused-but-set-variable so this patch moves
that option back to the general options list.
Change-Id: Id800e87eb688b9441b14380e2246ad586179f31a
[ROCm/ROCR-Runtime commit: 26808295f8]
Allows determining if the host can directly access HMM memory that
is physically resident in vram.
Change-Id: Ie452eedd0e27fe1b511afd416f5a1cd01b3d84e8
[ROCm/ROCR-Runtime commit: 9e53cab613]
Enables the fragment allocator to handle >2MB allocations, maintaining
good TLB alignment. Prior code contained a bug that caused the effective
API granule for vram allocations >2MB to be bumped to 2MB.
Also adjusts the block cache's block retention heuristic to not
count discarded blocks as in use. This will reduce block retention
when a significant amount of large blocks or IPC is in use.
Change-Id: I30bd85eb87951df822211f799d9cfe579ab109c6
[ROCm/ROCR-Runtime commit: 8adbda1c18]
Add macro debug_warning_n to stop printing a message after
N instances.
Change-Id: Id5f84b11eb63b3a20bd2bcb2ea8f10a066b457ef
[ROCm/ROCR-Runtime commit: ca8387768e]
Under high async handler load signal retention and event sorting
become bottlenecks. This change processes more handlers in a
single pass to amortize wait_any overheads.
Change-Id: I8b276e102db647e3858e120547aa0c6fca85ab4c
[ROCm/ROCR-Runtime commit: 6b398eb72c]
Old memory properties info name used after removing branches.
This caused the CPU coarse grain pool to initialize with random
bits.
Change-Id: I397bc5ecf09fab69bdf1d7fafadcf54d71b64070
[ROCm/ROCR-Runtime commit: 0439dc90cd]
Prevents poorly written tools which throw in tools interface
callbacks from causing ROCr to catch and return a generic error
code.
Change-Id: I2f5bf7104dc7d4ee688eb48423c7ffdb06bd7702
[ROCm/ROCR-Runtime commit: c9ce27a640]
Old logic did not consider memory held in the scratch cache to be
free when deciding whether or not to reclaim.
Change-Id: I7f7c7549c72d743edbf7c53489fe9a453dc4177a
[ROCm/ROCR-Runtime commit: 0b7d9db964]
Clarify behavior of hsa_ven_amd_loader_iterate_executables during
concurrent calls of executable creation and destruction.
Change-Id: Idc3e3981d4fcc0d58d9f1b7a7578deed20aa490b
[ROCm/ROCR-Runtime commit: 1bdc2f6854]
Add the hard limit of allocation size to be 1/2 available vram
to avoid allocation failure when allocation size equals to vram size.
Add printing block size in each round to report progress for long running
test
Add the block size skip info in result form(if any tests skipped).
Affected test:
rocrtstPerf.Memory_Async_Copy
Data Size Avg Time(us) Avg BW(GB/s) MinTime(us) Peak BW(GB/s)
128M 638759.570200 0.195692 637569.991000 0.196057
256M 1270058.822400 0.196841 1268425.758000 0.197095
Notice: Data Size larger than 512M is skipped due to hard limit of 1/2 vram size
Signed-off-by: Mengbing Wang <mengbing.wang@amd.com>
Change-Id: I4c4cea74a608272cc29d222b9399af26b34d7473
[ROCm/ROCR-Runtime commit: cf10c3bc35]
Includes some workarounds and HMM.
Conflicts:
opensrc/hsa-runtime/core/runtime/amd_topology.cpp
opensrc/hsa-runtime/core/util/flag.h
Change-Id: I22976f07964a43dbb228a6231777dbd599112b8d
[ROCm/ROCR-Runtime commit: 7333c77e22]
When no isa's are available no callbacks should be invoked. This
is not an error and should return success.
Change-Id: Ie4048aa8cbe5c3fdf5431f6a865021549ecf8a13
[ROCm/ROCR-Runtime commit: 4197461b7f]
Sramecc is misreported in kfd 4.0 and prior. To prevent possible
corruption due to d16 instructions, deny use of gfx906 with older
kfds and correct misreport for gfx908. Denial of gfx906 may be
overridden by setting HSA_IGNORE_SRAMECC_MISREPORT=1.
Change-Id: I7d5c3a716fad01c348f8b88cd508cedbf914c989
[ROCm/ROCR-Runtime commit: 45fbe5b192]
1. As we cannot ganrantee that 100% apu vram are free to be allocated, limit
the allocation size be no more than 3/4 of vram size.
2. Keep the old 1GB allocation limit for dGPU case.
3. Add the alignment check for alloc_size.
Affected tests:
rocrtstStress.Memory_Concurrent_Allocate_Test
rocrtstStress.Memory_Concurrent_Free_Test
Change-Id: Id0023de132024d02f80980ae4237d9d74d9e27d3
Signed-off-by: Mengbing Wang <mengbing.wang@amd.com>
[ROCm/ROCR-Runtime commit: d5855c1658]