Conserves VMIDs when multiple processes are in use and memory operations
are not GPU specific. For instance HIP API hipHostMalloc does not accept
a target GPU so when used with one process per GPU (ie GPU == MPI rank) we can
quickly exceed the available VMID slots if every process consumes a VMID on
every GPU.
Change-Id: Ib6fa051290089f71581029c09f9a44b9992237d1
simple test of mapping many system memory to gpu.
before
[ RUN ] KFDMemoryTest.MMap
[ ] Using ISA for GFXIP 9.0
[ ] successfully register/map 32GB system memory to gpu
[ OK ] KFDMemoryTest.MMap (36932 ms)
after
[ RUN ] KFDMemoryTest.MMap
[ ] Using ISA for GFXIP 9.0
[ ] successfully register/map 32GB system memory to gpu
[ OK ] KFDMemoryTest.MMap (11441 ms)
So there is 11s VS 36s improvement.
Looks like we can do something similar with vm_area too.
Change-Id: I0349aacdeddec3534016d28176f0fabf632c61fc
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Compare with gpu_mem_count instead of deprecated NUM_OF_SUPPORTED_GPUS
to prevent overflows in case no dGPUs are present.
Change-Id: I71fcb7503ba4c20bffadbdb04cefc4e4027a7df7
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
There were two doorbells, one embedded in another, which are very confusing.
Change the member variable name to mapping to differentiate them. Also,
rename doorbells_mutex to just mutext for brevity.
Change-Id: Iaa14a1a3ee09449a9089fc1fb39c916fdf32fb44
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
If opening drm render device fails (usually when the user is not a member
of video group), fmm_init_process_apertures() still returns success,
resulting in weird segfault in a later stage.
Change-Id: Ifbde4481629988944ad7f384d59753c88e287fa9
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Avoid warnings of the type
error: 'strncpy' specified bound 64 equals destination size
With the destination being 0-initialized, subtracting 1 from the
destination buffer size will ensure that the destination will be a
0-terminated string, even when it's truncated.
Change-Id: I7c3a90482065ce4d020db215e3e41348de51a083
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Add back missing pthread_mutex_lock.
Handle all error cases in fmm_release.
Change-Id: I8efa561ddadfd769cede5bf86300215ba3fb3dd1
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
__fmm_release actually fails to find the object if address is not
pagesize aligned. And the caller did not notice this as __fmm_release
has no err code return.
So to fix this, move the object lookup in caller, and use vm-object
instead. Also fmm_release will pass up the error code.
Change-Id: Ib8ea1ea5ae844844fd20e8e01f0fdb841d218f2c
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
* Use GNUInstallDirs
* Install headers in $prefix/include directly, drop symlink
* Install libraries in $prefix/lib directly, drop symlink
* Move LICENSE.md from hsakmt-roct-dev to hsakmt-roct
Change-Id: I43562f15cc03029be53e9ec18c337824d8116659
Signed-off-by: Slava Grigorev <slava.grigorev@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
SDMA will use atomic completion fences if KFD reports 64bit atomic support.
Otherwise it will fall back to store completion fences.
Change-Id: I12b76f8a74ec3ee96372c250f9824d846051536e
When KFD is already opened, Opening it again should return
HSAKMT_STATUS_KERNEL_ALREADY_OPENED to align with the specification.
Change-Id: Ib10a2d2c48781600bea7d072557d03ccb1a2bc19
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
These fixes are needed to find the hsakmt headers and libraries with
an upcoming hsakmt build system cleanup. It should continue to work
with the original hsakmt build system.
Change-Id: I6b3fcea8f2588698c130c9ec50952c66712afa6c
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Disable some tests that rely on features not typically available
in emulator and use smaller data and iteration sets
Change-Id: I587bf83162b114719e0361109ed44c6bf2adf34c
Upstream KFD doesn't support mapping doorbells to GPUVM yet. Fall
back to the old method.
Change-Id: I452a6fc59b88329b833844e3914c480c2f13c82d
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
- Clean up and renumber scratch memory ioctl
- Renumber get_tile_config ioctl
- Renumber set_trap_handler ioctl
- Update KFD_IOC_ALLOC_MEM_FLAGS
- Renumber GPUVM memory management ioctls
- Remove unused SEP_PROCESS_DGPU_APERTURE ioctl
- Update memory management ioctls
Replace device_ids_array_size (in bytes) with n_devices. Fix error
handling and use n_success to update device_id arrays in objects.
This commit breaks the ABI and requires a corresponding KFD change.
Change-Id: Ibf0af5a5188e817c886eab388d1533130fc18293
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Avoids using non-atomic SDMA fences by default since that path can duplicate fences.
If HSA_ENABLE_SDMA is set this will override copy path selection and may use
non-atomic fences.
Change-Id: I4747e9a766f7f649d21ddf6bfded047ac26fd60e
The main point is to move update_ctx_save_restore_size() out of if()
condition.
Change-Id: I58a1a4f3edca2d1c510fdd0e31e59b5c41e92a14
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
llvm.debugtrap and other trap IDs are reserved and should not place
the queue into an error state.
Change-Id: I98193a35ac7da94c4a42ee75d87754ee552ebea0