In a NUMA system, topology should report NumCaches as the number of caches
within the node but current code reports the total caches in the system. This
patch fixes the error. This patch also uses cpuid to get cache information
instead of reading from sysfs files. See "Intel Corporation, Intel 64 and IA-32
Architectures Software Developer's Manual Volume 2(2A, 2B & 2C) Instruction
Set Reference" 3-179 for cpuid instruction features used in this patch.
Change-Id: I8ecece6c2b230741822620b44e66ddc201ff5112
[ROCm/ROCR-Runtime commit: 73ad0a1942]
Since we include headers and not just a library anymore, we should be
considered a -dev package and not a lib package.
Change-Id: I220465ea4ffc8d66d8d76e6716e6c6c50cdacea1
[ROCm/ROCR-Runtime commit: 44572965f6]
All files should go into /opt/rocm/$component
For developer convenience, a single include directory is created through
symlinks, from the component include directory to /opt/rocm/include.
Similarly, a unified linked directory is present in /opt/rocm/lib
The component lib directory should not include linker names (library
names without version numbers).
This commit also fixes 'make rpm' running correctly without the need for
sourcing build/envsetup.sh
Change-Id: I95a680f6d3e3bd1ae688d0694934a0577dbd007c
[ROCm/ROCR-Runtime commit: 9f355b78a0]
Intermediate size was stored in a 32-bit variable. This resulted in
4GB allocations to fail in KFD due to 0 size. Larger allocations
would allocate the wrong amount of memory.
Change-Id: If19dedf64952f1d2edd813793241e12c0362d220
[ROCm/ROCR-Runtime commit: 82b3fad320]
Align with the rest of the driver stack on the new installation path
/opt/rocm/*
This mechanism for generating packages should be changed for something
nicer and more standards compliant in the future.
Change-Id: Ic31409b0d0b8f6ee4b25296d2580982a76aab564
[ROCm/ROCR-Runtime commit: 31861c838e]
HSA thunk is currently only aware of GPU node
model info, CPU names are NULL.
Signed-off-by: David Ogbeide <davidboyowa.ogbeide@amd.com>
Change-Id: I3c2adbb8566a5048b44c39fff4fd8228912468ff
[ROCm/ROCR-Runtime commit: 682776d89a]
This option may help debug synchronization or coherency issues
involving the GPU caches. It works only on dGPUs, by changing the
cache policy of the GPUVM default aperture to "cohrent", which is
implemented as non-cached on current dGPU hardware.
Change-Id: I544ac9cc5c0cf1fa5c4e30f67aa42b3b5e44ae67
[ROCm/ROCR-Runtime commit: 06d391c6c9]
Create QPI or HT links among all NUMA nodes. For now, assume all the
NUMA nodes are interconnected with same Weight (=1).
Change-Id: Id48ba95b9d75515a186f7dc5006b19bd92743ae3
[ROCm/ROCR-Runtime commit: f1fbacca15]
KFD may not be able to provide the precise VM fault address and status.
This flag will indicate whether the event data has the fault details
Change-Id: I15ffd5c25f555003c6450cc0700efb769418f76b
[ROCm/ROCR-Runtime commit: 79077811f5]
The Runtime requested this information so they can tell easily
whether a pointer is part of HSA shared address space or not.
Change-Id: If2041ed34031636677d692bc2dc6625634027ed4
[ROCm/ROCR-Runtime commit: 0ed29f5191]
Connect only (Peer-to-Peer) GPUs that belong to same NUMA node. Without
this additional check non direct GPUs would also get connected.
Change-Id: I9a5ed19b8f06cd0527854cbbdb51ede99eade28b
[ROCm/ROCR-Runtime commit: 8ff2bcd48d]
Lstopo doesn't have system memory mappings at low addresses. Make
sure we leave enough GPUVM address space for kernel allocations
(currently only CWSR) before the start of the user-managed SVM
aperture.
Change-Id: Ic197f7bd5a3cfb150a0da2bfdbc848664e7869be
[ROCm/ROCR-Runtime commit: cac0c08496]
Connect (Peer-to-Peer) GPUs that belong to same NUMA node.
Connect all [GPU] <--> [Non Parent NUMA] node
Change-Id: Ib4b08a6545d28b7dce4c9b1a90378bfc51bed07e
[ROCm/ROCR-Runtime commit: 7042292c60]
To simplify, allocate maximum needed memory for node_t->link array.
No need for realloc when indirect links are added. Trade off - for some
nodes more memory than required will be allocated.
This means the loop to compute the number of direct (reverse) io_links
for a CPU node is not necessary.
Change-Id: I2b2559142cbec3b262d0b4ea5fdebfd8f36c28fc
[ROCm/ROCR-Runtime commit: 1e729510d2]
Non-canonical GPUVM aperture doesn't exist on dGPUs. Remove comments
and code that say otherwise.
Fix alignment of GPUVM aperture for gfx801. Requires the same workaround
as gfx802. It's not used for anything on gfx801 yet, but will be soon.
Change-Id: I88607fe7b340081cc0715b85f28fdbf5f1bb0ad7
[ROCm/ROCR-Runtime commit: b837c3e7b0]
The Kernel only creates one way direct link -
GPU(PCI_BUS) --> [Parent NUMA Node]
Create the reverse direct io_link here -
[Parent NUMA Node] --> GPU(PCI_BUS)
Change-Id: I829a1b1b7f34bda42871ede3472d60915e88418c
[ROCm/ROCR-Runtime commit: 1d1c30db7c]
events_page is unprotected from multiple allocation. The first event
creation ioctl is unprotected from a race with args.event_page_offset
being set (for page setup) and null (all subsequent invocations).
Change-Id: I40ba712a17e9eff257785f90c553a74ad09c661d
Signed-off-by: Yair Shachar <Yair.Shachar@amd.com>
[ROCm/ROCR-Runtime commit: 3a662ac712]
Use the object size when freeing address space, instead of the
parameter passed in by the caller. The parameter may be incorrect
due to app or runtime bugs, or when the buffers is an AQL ring
buffer with double mapping workaround.
Change-Id: I00bb31d4520ef969a49d6d5ea723e8a33418acc3
[ROCm/ROCR-Runtime commit: 006f3ee41b]
The alignment performed in vm_find_object_by_address isn't sufficient
because it doesn't take into account the offset from the start of the
page.
This fixes a bug where certain unaligned userpointers and sizes fail
to register correctly.
Change-Id: I17872e264467a619f5e1bedb7e1ed3d994a856bf
[ROCm/ROCR-Runtime commit: 8a0161d6bb]
Use the aligned size of the buffer objects for CPU unmapping in
__fmm_release instead of relying on the unaligned size passed in by
the caller.
Change-Id: If986ec24e9a05d32981549fddbf143221fc40bac
[ROCm/ROCR-Runtime commit: 7a383f9d88]
Allocate SVM address space for the registered memory and use new
userptr support in KFD to create a system memory BO associated with
the given user pointer. Map this BO at the SVM address for CPU
access.
MapMemoryToGPU can be used with the registered user pointer and
will return the SVM address as alternate GPUVA.
Change-Id: I4886e193c51fb6870a567878870c36bf8b5c3748
[ROCm/ROCR-Runtime commit: 85f9efb1a0]
Few more counters are now available in GFX8 register specs. So adding
them. Also for gfx700 and gfx801 report correct number of SQ perf counter slots
Change-Id: I9e6b4b10238230aabeccbfaa5e491a28b5e54f2d
[ROCm/ROCR-Runtime commit: 1a0f915957]
Allocations from GPU nodes will return VRAM, not system memory.
Only non-paged allocation from GPU nodes is supported. System
memory can only be allocated from CPU nodes (usually node 0).
The HostAccess flag is no longer used to distinguish the memory
type. It only indicates, whether the memory is mapped for CPU
access.
Maintain compatibility with broken KfdTests by returning system
memory for paged-memory requested from GPU nodes.
Change-Id: I514defede735f55e6de436f41944125b6f2c4ccf
[ROCm/ROCR-Runtime commit: 887b32fe86]
This is thunk part of the CWSR support.
1. SDMA queue don't support CWSR , no necessary to allocate the context save/restore memory
2. Allocate the context save/restore memory in local frame buffer for dGPU
Change-Id: Ie83506f0cced2a5a537c49d68125796d831c2764
[ROCm/ROCR-Runtime commit: 4e6c25e55b]
All tonga page size alignment is done in the memory management
functions in fmm.c. All other code only specifies the minimum
alignment it needs and lets fmm.c handle the HW-specific
alignment.
Clean up aligned-exec memory allocation in queue.c to remove
hard-coded TONGA_PAGE_SIZE alignments and remove code duplication.
Make sure alignments are consistent between allocate and free.
Change-Id: Ia8923448173d1cef315af24cebff12adef385cb0
[ROCm/ROCR-Runtime commit: cc9fc386bd]
HSA thunk API returns null when querying for GPU node marketing
names due to empty system topology file.
- Add marketing names to device GFX IP data structs.
- Modify name retrieval to pull from data structs instead of file.
Signed-off by: David Ogbeide <davidboyowa.ogbeide@amd.com>
Change-Id: I30ea04111be7e0df2e93894f801fbeb414ffa790
[ROCm/ROCR-Runtime commit: 4e4a881940]
This prevents the library from being unloaded at runtime, even when
dlclose is called. This preserves global variables, such as state
about the SVM address space and avoids catastrophic leaks on dlclose.
Change-Id: I34f1d19a450835200e9d4815458e8d1b3045053c
[ROCm/ROCR-Runtime commit: cc7491ec71]
Stop using NUM_OF_SUPPORTED_GPUS. For now the definitions itself cannot
be removed as ioctl code is in upstream Kernel.
Change-Id: If846625a8ad5062d5483e762850c793d3c00b9d0
[ROCm/ROCR-Runtime commit: ce83dc623f]
Fix hsaKmtRegisterMemory to be a no-op for now and move the multi-GPU
implementation to hsaKmtRegisterMemoryToNodes. Make GPU memory mappings
of host memory visible to all GPUs by default. Device memory is still
visible to the allocating GPU only by default (but can be overridden
with hsaKmtRegisterMemoryToNodes for experimenting with P2P).
Change-Id: I73408afbe3b10c8dad2ab3a780f58413249692e6
[ROCm/ROCR-Runtime commit: 063ad3ad9e]
This is required when we have a debug session
Change-Id: If9d6d2d23a9016b6ca9562e02a91fc16e0354ee4
Signed-off-by: Yair Shachar <Yair.Shachar@amd.com>
[ROCm/ROCR-Runtime commit: 681f4dcecc]