Add vm_find_object_by_userptr_range so QueryPointerInfo can find the
object as well when the pointer is not the starting address but it's
inside the memory range. Also rename vm_find_object_xxx functions to
_by_address and _by_address_range to be consistent.
Change-Id: I5c2b3a05b41493e32b7fd9154665bf078b043606
[ROCm/ROCR-Runtime commit: 4911c91389]
Add CPUVM aperture to keep track of memory allocation that is not known
to GPU driver. Together with GPUVM, this patch adds the pointer attributes
support to APU.
Change-Id: If13f9cf01ff8b9f709b99b66661e7505246adf4c
[ROCm/ROCR-Runtime commit: 19f2676ea7]
Add two pointer attributes APIs:
hsaKmtQueryPointerInfo - allow the user to query the memory information
using a pointer. This pointer can point to any address inside the
range known to HSA.
hsaKmtSetMemoryUserData - allow the user to attach data to a pointer to
add memory tracking information. This pointer must match the start
address of a memory allocation or registration.
TODO: This patch implements support on dGPU. Needs to add APU.
Change-Id: I4711809274248434901f0794f50ebfa13a7371a8
[ROCm/ROCR-Runtime commit: 51e4d27c37]
For APUs, use /proc/cpuinfo to get Marketing name.
Change-Id: I4a17516d26a092683f36631032be00ad44f7e7fe
Signed-off-by: Lan Xiao <Lan.Xiao@amd.com>
[ROCm/ROCR-Runtime commit: df593aa076]
Compiling in 32bit mode is broken, and we don't have an intention on
restarting compatibility with 32bit apps.
Change-Id: I5524b5b63fe62e6026aa04d84c4510e290a86106
[ROCm/ROCR-Runtime commit: e0c77a38cb]
HSA thunk API is currently reporting engineering name to MarketingName
and returning NULL when querying for AMDName.
-Change current name reporting from MarketingName to AMDName.
-Use libpci to get MarketingName
Change-Id: I819a6de7b067a2e724a6695e7d800274b83a71f8
Signed-off-by: Lan Xiao <Lan.Xiao@amd.com>
[ROCm/ROCR-Runtime commit: 9cbbf30be7]
The thunk spec requires that CUMaskCount be divisible by 32. Check this
and return INVALID_PARAMETER if it is not.
Change-Id: I4e0c8502d996d3da31224b817a5d4ff2c6054e13
[ROCm/ROCR-Runtime commit: 70b1b5b17e]
EventId is needed in calling hsaKmtDestroyEvent() when mmap failed,
so we should move it ahead of mmap call.
Change-Id: I5f4288b953611799a02b0e988d6b2e48104466a0
[ROCm/ROCR-Runtime commit: 9c9bfa30c0]
Counter IDs in SQ_PERFCOUNTER0_SELECT are identical on gfx803 10 and
gfx803 11.
Change-Id: I5cfefd44b52989efd1d89311cf8c70c84ea2b230
[ROCm/ROCR-Runtime commit: 0b5c65a903]
get_block_properties uses the complete DID to identify the GPU. This list
is getting too long when more devices are added. Reading the 12 most
significant digits is good enough to identify the GPU.
Change-Id: Ieebb05402bbe08af12eb7289dfeb5bbf1f515b0f
[ROCm/ROCR-Runtime commit: 6c4d19a9d2]
Add performance counters for gfx70x. The reference is the gfx7 register spec.
The register being looked at is SQ_PERFCOUNTER0_SELECT.
Change-Id: I344bfb7452f6148f4dc268163d12c553c6be8424
[ROCm/ROCR-Runtime commit: 6d21c4e753]
Stepping 1 indicates higher double-precision float performance and
potentially other runtime workarounds needed for lack of PCIe atomics
on gfx70x.
Change-Id: I97185c1233e7d24caaf20a1eadea931d5a2bc664
[ROCm/ROCR-Runtime commit: fa102f3b8b]
In a NUMA system, topology should report NumCaches as the number of caches
within the node but current code reports the total caches in the system. This
patch fixes the error. This patch also uses cpuid to get cache information
instead of reading from sysfs files. See "Intel Corporation, Intel 64 and IA-32
Architectures Software Developer's Manual Volume 2(2A, 2B & 2C) Instruction
Set Reference" 3-179 for cpuid instruction features used in this patch.
Change-Id: I8ecece6c2b230741822620b44e66ddc201ff5112
[ROCm/ROCR-Runtime commit: 73ad0a1942]
Since we include headers and not just a library anymore, we should be
considered a -dev package and not a lib package.
Change-Id: I220465ea4ffc8d66d8d76e6716e6c6c50cdacea1
[ROCm/ROCR-Runtime commit: 44572965f6]
All files should go into /opt/rocm/$component
For developer convenience, a single include directory is created through
symlinks, from the component include directory to /opt/rocm/include.
Similarly, a unified linked directory is present in /opt/rocm/lib
The component lib directory should not include linker names (library
names without version numbers).
This commit also fixes 'make rpm' running correctly without the need for
sourcing build/envsetup.sh
Change-Id: I95a680f6d3e3bd1ae688d0694934a0577dbd007c
[ROCm/ROCR-Runtime commit: 9f355b78a0]
Intermediate size was stored in a 32-bit variable. This resulted in
4GB allocations to fail in KFD due to 0 size. Larger allocations
would allocate the wrong amount of memory.
Change-Id: If19dedf64952f1d2edd813793241e12c0362d220
[ROCm/ROCR-Runtime commit: 82b3fad320]
Align with the rest of the driver stack on the new installation path
/opt/rocm/*
This mechanism for generating packages should be changed for something
nicer and more standards compliant in the future.
Change-Id: Ic31409b0d0b8f6ee4b25296d2580982a76aab564
[ROCm/ROCR-Runtime commit: 31861c838e]
HSA thunk is currently only aware of GPU node
model info, CPU names are NULL.
Signed-off-by: David Ogbeide <davidboyowa.ogbeide@amd.com>
Change-Id: I3c2adbb8566a5048b44c39fff4fd8228912468ff
[ROCm/ROCR-Runtime commit: 682776d89a]
This option may help debug synchronization or coherency issues
involving the GPU caches. It works only on dGPUs, by changing the
cache policy of the GPUVM default aperture to "cohrent", which is
implemented as non-cached on current dGPU hardware.
Change-Id: I544ac9cc5c0cf1fa5c4e30f67aa42b3b5e44ae67
[ROCm/ROCR-Runtime commit: 06d391c6c9]
Create QPI or HT links among all NUMA nodes. For now, assume all the
NUMA nodes are interconnected with same Weight (=1).
Change-Id: Id48ba95b9d75515a186f7dc5006b19bd92743ae3
[ROCm/ROCR-Runtime commit: f1fbacca15]
KFD may not be able to provide the precise VM fault address and status.
This flag will indicate whether the event data has the fault details
Change-Id: I15ffd5c25f555003c6450cc0700efb769418f76b
[ROCm/ROCR-Runtime commit: 79077811f5]
The Runtime requested this information so they can tell easily
whether a pointer is part of HSA shared address space or not.
Change-Id: If2041ed34031636677d692bc2dc6625634027ed4
[ROCm/ROCR-Runtime commit: 0ed29f5191]
Connect only (Peer-to-Peer) GPUs that belong to same NUMA node. Without
this additional check non direct GPUs would also get connected.
Change-Id: I9a5ed19b8f06cd0527854cbbdb51ede99eade28b
[ROCm/ROCR-Runtime commit: 8ff2bcd48d]
Lstopo doesn't have system memory mappings at low addresses. Make
sure we leave enough GPUVM address space for kernel allocations
(currently only CWSR) before the start of the user-managed SVM
aperture.
Change-Id: Ic197f7bd5a3cfb150a0da2bfdbc848664e7869be
[ROCm/ROCR-Runtime commit: cac0c08496]
Connect (Peer-to-Peer) GPUs that belong to same NUMA node.
Connect all [GPU] <--> [Non Parent NUMA] node
Change-Id: Ib4b08a6545d28b7dce4c9b1a90378bfc51bed07e
[ROCm/ROCR-Runtime commit: 7042292c60]
To simplify, allocate maximum needed memory for node_t->link array.
No need for realloc when indirect links are added. Trade off - for some
nodes more memory than required will be allocated.
This means the loop to compute the number of direct (reverse) io_links
for a CPU node is not necessary.
Change-Id: I2b2559142cbec3b262d0b4ea5fdebfd8f36c28fc
[ROCm/ROCR-Runtime commit: 1e729510d2]
Non-canonical GPUVM aperture doesn't exist on dGPUs. Remove comments
and code that say otherwise.
Fix alignment of GPUVM aperture for gfx801. Requires the same workaround
as gfx802. It's not used for anything on gfx801 yet, but will be soon.
Change-Id: I88607fe7b340081cc0715b85f28fdbf5f1bb0ad7
[ROCm/ROCR-Runtime commit: b837c3e7b0]
The Kernel only creates one way direct link -
GPU(PCI_BUS) --> [Parent NUMA Node]
Create the reverse direct io_link here -
[Parent NUMA Node] --> GPU(PCI_BUS)
Change-Id: I829a1b1b7f34bda42871ede3472d60915e88418c
[ROCm/ROCR-Runtime commit: 1d1c30db7c]
This is a simple README.md since most of the details should be in the
ROCK project.
Change-Id: I3175e2a5ade0f9ecb913076a4842b528f14947f0
[ROCm/ROCR-Runtime commit: 35e8fc6b15]
events_page is unprotected from multiple allocation. The first event
creation ioctl is unprotected from a race with args.event_page_offset
being set (for page setup) and null (all subsequent invocations).
Change-Id: I40ba712a17e9eff257785f90c553a74ad09c661d
Signed-off-by: Yair Shachar <Yair.Shachar@amd.com>
[ROCm/ROCR-Runtime commit: 3a662ac712]
Use the object size when freeing address space, instead of the
parameter passed in by the caller. The parameter may be incorrect
due to app or runtime bugs, or when the buffers is an AQL ring
buffer with double mapping workaround.
Change-Id: I00bb31d4520ef969a49d6d5ea723e8a33418acc3
[ROCm/ROCR-Runtime commit: 006f3ee41b]
The alignment performed in vm_find_object_by_address isn't sufficient
because it doesn't take into account the offset from the start of the
page.
This fixes a bug where certain unaligned userpointers and sizes fail
to register correctly.
Change-Id: I17872e264467a619f5e1bedb7e1ed3d994a856bf
[ROCm/ROCR-Runtime commit: 8a0161d6bb]