Allocate doorbells for dGPUs in the SVM aperture and map them for
GPU access. This is necessary to allow GPU-initiated submissions to
user mode queues.
Depends on new doorbell BO allocation flag in KFD.
Change-Id: I0737bef4a4764bb4a66c43846707ead2108f6601
CPU cache information reported by Thunk topology is obtained from cpuid
instruction. This instruction only applies to X86 systems. It can cause
compile errors on non-X86 platforms. This patch temporarily disables CPU
cache functions in topology for non-X86 platforms in order to compile.
Change-Id: If86671817b0d036cb324eebf3f354682bfb75856
Add vm_find_object_by_userptr_range so QueryPointerInfo can find the
object as well when the pointer is not the starting address but it's
inside the memory range. Also rename vm_find_object_xxx functions to
_by_address and _by_address_range to be consistent.
Change-Id: I5c2b3a05b41493e32b7fd9154665bf078b043606
Add CPUVM aperture to keep track of memory allocation that is not known
to GPU driver. Together with GPUVM, this patch adds the pointer attributes
support to APU.
Change-Id: If13f9cf01ff8b9f709b99b66661e7505246adf4c
Add two pointer attributes APIs:
hsaKmtQueryPointerInfo - allow the user to query the memory information
using a pointer. This pointer can point to any address inside the
range known to HSA.
hsaKmtSetMemoryUserData - allow the user to attach data to a pointer to
add memory tracking information. This pointer must match the start
address of a memory allocation or registration.
TODO: This patch implements support on dGPU. Needs to add APU.
Change-Id: I4711809274248434901f0794f50ebfa13a7371a8
Compiling in 32bit mode is broken, and we don't have an intention on
restarting compatibility with 32bit apps.
Change-Id: I5524b5b63fe62e6026aa04d84c4510e290a86106
HSA thunk API is currently reporting engineering name to MarketingName
and returning NULL when querying for AMDName.
-Change current name reporting from MarketingName to AMDName.
-Use libpci to get MarketingName
Change-Id: I819a6de7b067a2e724a6695e7d800274b83a71f8
Signed-off-by: Lan Xiao <Lan.Xiao@amd.com>
The thunk spec requires that CUMaskCount be divisible by 32. Check this
and return INVALID_PARAMETER if it is not.
Change-Id: I4e0c8502d996d3da31224b817a5d4ff2c6054e13
EventId is needed in calling hsaKmtDestroyEvent() when mmap failed,
so we should move it ahead of mmap call.
Change-Id: I5f4288b953611799a02b0e988d6b2e48104466a0
get_block_properties uses the complete DID to identify the GPU. This list
is getting too long when more devices are added. Reading the 12 most
significant digits is good enough to identify the GPU.
Change-Id: Ieebb05402bbe08af12eb7289dfeb5bbf1f515b0f
Add performance counters for gfx70x. The reference is the gfx7 register spec.
The register being looked at is SQ_PERFCOUNTER0_SELECT.
Change-Id: I344bfb7452f6148f4dc268163d12c553c6be8424
Stepping 1 indicates higher double-precision float performance and
potentially other runtime workarounds needed for lack of PCIe atomics
on gfx70x.
Change-Id: I97185c1233e7d24caaf20a1eadea931d5a2bc664
In a NUMA system, topology should report NumCaches as the number of caches
within the node but current code reports the total caches in the system. This
patch fixes the error. This patch also uses cpuid to get cache information
instead of reading from sysfs files. See "Intel Corporation, Intel 64 and IA-32
Architectures Software Developer's Manual Volume 2(2A, 2B & 2C) Instruction
Set Reference" 3-179 for cpuid instruction features used in this patch.
Change-Id: I8ecece6c2b230741822620b44e66ddc201ff5112
Since we include headers and not just a library anymore, we should be
considered a -dev package and not a lib package.
Change-Id: I220465ea4ffc8d66d8d76e6716e6c6c50cdacea1
All files should go into /opt/rocm/$component
For developer convenience, a single include directory is created through
symlinks, from the component include directory to /opt/rocm/include.
Similarly, a unified linked directory is present in /opt/rocm/lib
The component lib directory should not include linker names (library
names without version numbers).
This commit also fixes 'make rpm' running correctly without the need for
sourcing build/envsetup.sh
Change-Id: I95a680f6d3e3bd1ae688d0694934a0577dbd007c
Intermediate size was stored in a 32-bit variable. This resulted in
4GB allocations to fail in KFD due to 0 size. Larger allocations
would allocate the wrong amount of memory.
Change-Id: If19dedf64952f1d2edd813793241e12c0362d220
Align with the rest of the driver stack on the new installation path
/opt/rocm/*
This mechanism for generating packages should be changed for something
nicer and more standards compliant in the future.
Change-Id: Ic31409b0d0b8f6ee4b25296d2580982a76aab564
HSA thunk is currently only aware of GPU node
model info, CPU names are NULL.
Signed-off-by: David Ogbeide <davidboyowa.ogbeide@amd.com>
Change-Id: I3c2adbb8566a5048b44c39fff4fd8228912468ff
This option may help debug synchronization or coherency issues
involving the GPU caches. It works only on dGPUs, by changing the
cache policy of the GPUVM default aperture to "cohrent", which is
implemented as non-cached on current dGPU hardware.
Change-Id: I544ac9cc5c0cf1fa5c4e30f67aa42b3b5e44ae67
Create QPI or HT links among all NUMA nodes. For now, assume all the
NUMA nodes are interconnected with same Weight (=1).
Change-Id: Id48ba95b9d75515a186f7dc5006b19bd92743ae3
KFD may not be able to provide the precise VM fault address and status.
This flag will indicate whether the event data has the fault details
Change-Id: I15ffd5c25f555003c6450cc0700efb769418f76b
The Runtime requested this information so they can tell easily
whether a pointer is part of HSA shared address space or not.
Change-Id: If2041ed34031636677d692bc2dc6625634027ed4
Connect only (Peer-to-Peer) GPUs that belong to same NUMA node. Without
this additional check non direct GPUs would also get connected.
Change-Id: I9a5ed19b8f06cd0527854cbbdb51ede99eade28b
Lstopo doesn't have system memory mappings at low addresses. Make
sure we leave enough GPUVM address space for kernel allocations
(currently only CWSR) before the start of the user-managed SVM
aperture.
Change-Id: Ic197f7bd5a3cfb150a0da2bfdbc848664e7869be
Connect (Peer-to-Peer) GPUs that belong to same NUMA node.
Connect all [GPU] <--> [Non Parent NUMA] node
Change-Id: Ib4b08a6545d28b7dce4c9b1a90378bfc51bed07e
To simplify, allocate maximum needed memory for node_t->link array.
No need for realloc when indirect links are added. Trade off - for some
nodes more memory than required will be allocated.
This means the loop to compute the number of direct (reverse) io_links
for a CPU node is not necessary.
Change-Id: I2b2559142cbec3b262d0b4ea5fdebfd8f36c28fc
Non-canonical GPUVM aperture doesn't exist on dGPUs. Remove comments
and code that say otherwise.
Fix alignment of GPUVM aperture for gfx801. Requires the same workaround
as gfx802. It's not used for anything on gfx801 yet, but will be soon.
Change-Id: I88607fe7b340081cc0715b85f28fdbf5f1bb0ad7
The Kernel only creates one way direct link -
GPU(PCI_BUS) --> [Parent NUMA Node]
Create the reverse direct io_link here -
[Parent NUMA Node] --> GPU(PCI_BUS)
Change-Id: I829a1b1b7f34bda42871ede3472d60915e88418c