Add performance counters for gfx70x. The reference is the gfx7 register spec.
The register being looked at is SQ_PERFCOUNTER0_SELECT.
Change-Id: I344bfb7452f6148f4dc268163d12c553c6be8424
[ROCm/ROCR-Runtime commit: 6d21c4e753]
Stepping 1 indicates higher double-precision float performance and
potentially other runtime workarounds needed for lack of PCIe atomics
on gfx70x.
Change-Id: I97185c1233e7d24caaf20a1eadea931d5a2bc664
[ROCm/ROCR-Runtime commit: fa102f3b8b]
In a NUMA system, topology should report NumCaches as the number of caches
within the node but current code reports the total caches in the system. This
patch fixes the error. This patch also uses cpuid to get cache information
instead of reading from sysfs files. See "Intel Corporation, Intel 64 and IA-32
Architectures Software Developer's Manual Volume 2(2A, 2B & 2C) Instruction
Set Reference" 3-179 for cpuid instruction features used in this patch.
Change-Id: I8ecece6c2b230741822620b44e66ddc201ff5112
[ROCm/ROCR-Runtime commit: 73ad0a1942]
max_single_fill_size_ overflowed the packet field size. Reduce by one dword.
[git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1259263]
[ROCm/ROCR-Runtime commit: 1d4a257225]
Since we include headers and not just a library anymore, we should be
considered a -dev package and not a lib package.
Change-Id: I220465ea4ffc8d66d8d76e6716e6c6c50cdacea1
[ROCm/ROCR-Runtime commit: 44572965f6]
Querying HSA_AMD_AGENT_MEMORY_POOL_INFO_LINK_INFO between a gpu agent
and its own local memory pool returns a wrong information.
Fix: return link with 0 hop count.
[git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1257544]
[ROCm/ROCR-Runtime commit: 5a584fa1ab]
Fix dirty-tree status. Thanks to Fan for fixing the issue.
[git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1256716]
[ROCm/ROCR-Runtime commit: 0545761aa9]
Remove mutex and just make the thread spin again if the queue is wrapping.
Remove the wait for the queue to finish wrapping, and just check if there is enough space to recycle when reserving queue space.
[git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1256713]
[ROCm/ROCR-Runtime commit: ea67bb8374]
All files should go into /opt/rocm/$component
For developer convenience, a single include directory is created through
symlinks, from the component include directory to /opt/rocm/include.
Similarly, a unified linked directory is present in /opt/rocm/lib
The component lib directory should not include linker names (library
names without version numbers).
This commit also fixes 'make rpm' running correctly without the need for
sourcing build/envsetup.sh
Change-Id: I95a680f6d3e3bd1ae688d0694934a0577dbd007c
[ROCm/ROCR-Runtime commit: 9f355b78a0]
Build system/Package maintainer:
- BUILDID is specified at cmake.
- USAGE: cmake -DBUILDID=<ID> ../src
For developer builds the who typically don�t provide BUILDID, cmake will:
- Determine the last git commit when this tree was syncd
- Deteremine the build date
- Check if tree is clean when built
The idea of this embedded string is that later when you get a ROCR build, you can get some idea on the build origination by using: strings libhsa-runtime.so.1 | grep �ROCR BUILD ID�
For eg:
- If it�s a Jenkins build 25, it returns: �ROCR BUILD ID: 25�
- If it�s a developer build sync'd @ 06f5f2a with modifications, it returns: �ROCR BUILD ID: 06f5f2a-2016-04-11-0"
[git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1256588]
[ROCm/ROCR-Runtime commit: a148fd0b68]
Intermediate size was stored in a 32-bit variable. This resulted in
4GB allocations to fail in KFD due to 0 size. Larger allocations
would allocate the wrong amount of memory.
Change-Id: If19dedf64952f1d2edd813793241e12c0362d220
[ROCm/ROCR-Runtime commit: 82b3fad320]
Align with the rest of the driver stack on the new installation path
/opt/rocm/*
This mechanism for generating packages should be changed for something
nicer and more standards compliant in the future.
Change-Id: Ic31409b0d0b8f6ee4b25296d2580982a76aab564
[ROCm/ROCR-Runtime commit: 31861c838e]
HSA Finalizer: Add dumping of code object, ISA and executable to loader.
This is controlled by loader options -dump-all, -dump-isa, -dump-code, -dump-exec
The options can now also be set with env variable LOADER_OPTIONS_APPEND.
Added tests to finalizer_offline
Testing: smoke, dumping on hardware
Reviewed by: Konstantin Zhuravlyov
[git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1255351]
[ROCm/ROCR-Runtime commit: a795909bca]
- Partially remove 'amd_load_map' extension because it is not used and will not be used
- Remove 'hsa_amd_query_kernel_host_address' API
- Add 'hsa_ext_amd_loaded_code_object' extension
- Add 'hsa_ext_amd_loaded_code_object_query_host_address' API
- Most likely to be used by debugger, profiler, and hcc (printf)
- Update affected sources
- 'hsa_system_extension_supported'
- 'hsa_system_get_extension_table'
- SoftCP path
- Integrate CLs 1250699, 1251204, 1251214 from stg sc
ReviewBoardURL: http://ocltc.amd.com/reviews/r/10091/
Testing: smoke (ok), teamcity (ok), samples on fiji (AQL and SoftCP) (ok)
[git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1251223]
[ROCm/ROCR-Runtime commit: f6565a2f70]
HSA thunk is currently only aware of GPU node
model info, CPU names are NULL.
Signed-off-by: David Ogbeide <davidboyowa.ogbeide@amd.com>
Change-Id: I3c2adbb8566a5048b44c39fff4fd8228912468ff
[ROCm/ROCR-Runtime commit: 682776d89a]
This option may help debug synchronization or coherency issues
involving the GPU caches. It works only on dGPUs, by changing the
cache policy of the GPUVM default aperture to "cohrent", which is
implemented as non-cached on current dGPU hardware.
Change-Id: I544ac9cc5c0cf1fa5c4e30f67aa42b3b5e44ae67
[ROCm/ROCR-Runtime commit: 06d391c6c9]
Create QPI or HT links among all NUMA nodes. For now, assume all the
NUMA nodes are interconnected with same Weight (=1).
Change-Id: Id48ba95b9d75515a186f7dc5006b19bd92743ae3
[ROCm/ROCR-Runtime commit: f1fbacca15]