Update build instructions in README.md to use absolute path on cmake
parameter, CMAKE_MODULE_PATH. Relative path causes build error. Tested
on cmake 3.5.1 ans cmake 3.5.2.
Change-Id: I1b8e8deb9f4941580580be8087a94655ae155d02
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Previously kfd device is used to map memory for CPU access.
However this is not compatible with how TTM handles CPU mapping
on eviction - memory won't be unmapped and remapped on restore.
This fixes the issue by mmapping memory using DRM render device.
This patch requires a coordinated kernel driver change to work.
To make it compatible with old kernel driver, some temporary codes
are included. Once the coordinated kernel driver is checked in,
the temporary codes can be removed.
Change-Id: Ie7b304c4a82b7e8d5ab703acb81d66430af4f0bc
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
1. Add hsa ext api hsa_amd_register_vmfault_handler for debugger to register callback in case of VM fault.
2. Extend hsa_ven_amd_loader API to:
(1) iterate loaded code objects in executable:
hsa_ven_amd_loader_executable_iterate_loaded_code_objects
(2) get loaded code object info:
hsa_ven_amd_loader_loaded_code_object_get_info
3. Make the id of hsa_queue the same as the one used in communication with thunk (for amd_aql_queue)
Change-Id: I68910809e59e24297350d262606f00e96c14bcbd
Child process hsaKmtOpenKFD() call must re-initialize global variables
copied from parent process. This includes close all file handles, free
dynamically malloc buf. Double free issue is because destroy_device_
debugging_memory() free the memory in parent process hsaKmtCloseKFD()
but don't reset it to null pointer. As a result, child process free it
again. kfd_fd is closed in parent process but don't reset to 0, so
child process close it again.
Fix: reset kfd_fd to 0 after close, reset is_device_debugged pointer to 0
after free
Change-Id: I421b3decbcaa4111298b8e599aa16940d851a58c
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Adds the thunk include and lib paths to the cache, removes paths
to indicator files from the cache, uses the cached path directory
(if any) as a search hint for indicator files.
Change-Id: I0859faa8d229a97abfaacb408d2c831e317aed5f
Because of HW design change, GPUVM aperture is no longer needed on GFX9
APUs. However, on APUs some functionalities still depend on GPUVM
aperture, so we choose to use SVM aperture instead to assume
the functionality of previous GPUVM aperture.
Change-Id: Ife7f0d598dd7989f2bcf7cdf3466d5a68703ca60
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Use mbind to specify the NUMA node for system memory allocation. This
only works with HSA_USERPTR_FOR_PAGED_MEM=1.
Change-Id: I88e7815d5a5aefcc4c22358c1a4a1635d7677ef3
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Marking heap memory as executable using mprotect() is not allowed
by SELinux. mprotect() calls that try to do this will fail on systems
with SELinux enabled. This is also a security risk, so it should be
fixed even on systems that allow this.
Any memory we want to mark as executable must be allocated using mmap().
See https://www.akkadia.org/drepper/selinux-mem.html
The two places where we try to mark heap memory as executable both use
posix_memalign() to allocate the heap memory. In both cases, the
alignment value passed into this function is always equal to PAGE_SIZE,
which means that they are safe to replace with mmap(), which guarantees
alignment to PAGE_SIZE. In this case PAGE_SIZE has been set to
sysconf(_SC_PAGESIZE);
v2:
- Use MAP_PRIVATE instead of MAP_SHARED. This matches the behavior
of memory allocated by posix_memalign()
- Ignore alignment hints instead of returning error when we can't
accommodate them.
- Drop alignment parameter of allocate_exec_aligned_memory() since
the only alignment supported is sysconf(_SC_PAGESIZE).
- Remove extra parameter from fmm_release().
- Add error path to fmm_allocate_host_cpu() for when mmap fails.
v3:
- Avoid use after free.
Change-Id: I7d51279790d9700bc3fa761c44bfde1c1936019b
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
src/perfctr.c: In function ‘destroy_shared_region’:
src/perfctr.c:154:10: error: logical ‘and’ of equal expressions [-Werror=logical-op]
if (sem && sem != SEM_FAILED) {
^~
src/perfctr.c: In function ‘update_block_slots’:
src/perfctr.c:323:11: error: logical ‘or’ of equal expressions [-Werror=logical-op]
if (!sem || sem == SEM_FAILED)
^~
v2:
- Initialize and reset sem to SEM_FAILED.
Change-Id: Id70361079b715c4946b13e4460e4fd85d9542c46
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
TensorFlow was running out of VRAM due to padding up allocations
from legacy memory APIs. These allocations have been added to
the fragment allocator to improve VRAM utilization.
Change-Id: Ic680fff576a0434b3b17a4c91746da44e09957fa
Fix a while loop that can cause forever loop when cpuid instruction
doesn't work properly.
Change-Id: Iefa49d23b40c994eb4369621974a7d3c4067e47a
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Kernel file has been changed recently, so we update the file in thunk.
Change-Id: I359a389fa9d91641114c7fb75f420ee6b16f467a
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
one or both directions. Users can enumerate the pools reported
by system to specify which pools serve as source / destination
Change-Id: I8e6d0adb3743b3328dd3ce9152762ca840ea613b
Since access may only be manipulated on whole pages, suballocator fragments must cooperate to set the page's access.
Since the KFD does not migrate memory on access changes this implementation makes agent access sticky across the requests in a fragmented page.
Change-Id: I88479ed45fb40e9782b704526a7b8ffb22e7bd76
GCC can't reasonably be told that the lock ptr isn't null. Adding a private bool
allows the branch to be eliminated, along with the bool.
Change-Id: I0605d69474d6a6e6951be93c0af1d8caf3f77124