コミットグラフ

305 コミット

作成者 SHA1 メッセージ 日付
Yong Zhao 110e754f64 Differentiate gfx700 and improve the logic by introducing is_gfx700()
Because gfx700 has local memory but other APUs don't, we should reflect
that in the code. Meanwhile, fix a bug that on gfx902 svm aperture is not
added when calling hsaKmtGetNodeMemoryProperties().

Change-Id: Id840f2db0b14fda9ee713b219a9474c15f8a9771
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2018-08-09 21:39:37 -04:00
xinhui pan 8fbf4a26ec thunk: fix a vm area release issue
On some asics, like tonga, the memory alignment size is as big as 0x8000.

fmm_allocate* alloc vm area with size passed in which is not aligned mostly.
But __fmm_release free vm area with vm_object_t->size which is aligned.

That might cause aperture_release_area fail to free the vm area as the
size might be bigger than zone itself or it just free another vm area
nearby unexpected.

This patch somehow will alloc more space than it needed on tonga.
gfx900+ is not affected.

Change-Id: I5a88c92b08c4e6f6bc05881798f769b55d6debe9
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-08-09 06:08:15 -04:00
Yong Zhao fe04dd6890 Calculate and store the first gpu mem during initializaiton
Previously we used the first dgpu mem, but after careful examination, we
found it only needs to be a GPU, so we modify the code to reflect that as
well.

Change-Id: I069d9b8e247aed55c1f885b79f743ea8e03ddf93
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2018-08-08 13:54:09 -04:00
Yong Zhao 4bb90d048c Remove the use of IS_DGPU()
The information can be obtained directly from node id. Also improve the
whole logic for future compatibility.

Change-Id: I130733be4e7930d5953d5e81409905e60c2ec35e
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2018-08-07 18:07:04 -04:00
Felix Kuehling c21927f425 libhsakmt: Fix problems init_svm_apertures
Unset ret_addr when unmapping the address space reservation. Otherwise
it may try to unmap it again later.

Remember the actual map_size and use it instead of len outside the
reservation loops.

Change-Id: I1a6b3fecfb59e22a713e5ed49c3ed37914cb6fb5
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-08-03 22:09:52 -04:00
Yong Zhao 08b6685dd5 Change the confusing type and name in topology
node is used repeatedly and excessively, which caused unnecessary confusion.

Change-Id: I4ae4171887df5e5b85209a5af8a636e6d72e5e82
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2018-08-03 12:00:17 -04:00
xinhui pan ab9017715f use rbtree instead of vm_objects list
simple test of mapping many system memory to gpu.
before
[ RUN      ] KFDMemoryTest.MMap
[          ] Using ISA for GFXIP 9.0
[          ] successfully register/map 32GB system memory to gpu
[       OK ] KFDMemoryTest.MMap (36932 ms)

after
[ RUN      ] KFDMemoryTest.MMap
[          ] Using ISA for GFXIP 9.0
[          ] successfully register/map 32GB system memory to gpu
[       OK ] KFDMemoryTest.MMap (11441 ms)

So there is 11s VS 36s improvement.

Looks like we can do something similar with vm_area too.

Change-Id: I0349aacdeddec3534016d28176f0fabf632c61fc
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-07-08 22:38:22 -04:00
Felix Kuehling d3228f363e Fix wrong loop termination condition
Compare with gpu_mem_count instead of deprecated NUM_OF_SUPPORTED_GPUS
to prevent overflows in case no dGPUs are present.

Change-Id: I71fcb7503ba4c20bffadbdb04cefc4e4027a7df7
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-07-05 17:04:40 -04:00
Yong Zhao 4839882fc8 Set the write permission according to the flag when allocating host cpu mem
Change-Id: I758c2b5b1799e968fa852646e1494fabb68c782d
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
2018-07-03 20:39:01 -04:00
Slava Grigorev 89e35574e3 Fix 'strncpy' truncating warnings when compiling with gcc 8
Change-Id: Ib145bab9450281da05f70dea34433b83438a756b
Signed-off-by: Slava Grigorev <slava.grigorev@amd.com>
2018-06-29 17:06:08 -04:00
Yong Zhao 4eaaf9694d Simplify if else logic for hsaKmtAllocMemory()
The new logic is easier to follow.

Change-Id: I69759a45c5dedaefeff831a2367253d3a4486bd3
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
2018-06-29 14:39:52 -04:00
Yong Zhao 5972fac417 Rename two variable names in doorbells structure
There were two doorbells, one embedded in another, which are very confusing.
Change the member variable name to mapping to differentiate them. Also,
rename doorbells_mutex to just mutext for brevity.

Change-Id: Iaa14a1a3ee09449a9089fc1fb39c916fdf32fb44
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
2018-06-28 16:04:35 -04:00
Yong Zhao 77ec699460 Fix a bug that fmm_init_process_apertures() returns incorrect value
If opening drm render device fails (usually when the user is not a member
of video group), fmm_init_process_apertures() still returns success,
resulting in weird segfault in a later stage.

Change-Id: Ifbde4481629988944ad7f384d59753c88e287fa9
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
2018-06-28 16:03:07 -04:00
Felix Kuehling fb551a44af Fix compiler warning on Fedora 28
Avoid warnings of the type
    error: 'strncpy' specified bound 64 equals destination size

With the destination being 0-initialized, subtracting 1 from the
destination buffer size will ensure that the destination will be a
0-terminated string, even when it's truncated.

Change-Id: I7c3a90482065ce4d020db215e3e41348de51a083
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-06-25 14:36:49 -04:00
Felix Kuehling 4e766615d7 Fixup previous commit
Add back missing pthread_mutex_lock.

Handle all error cases in fmm_release.

Change-Id: I8efa561ddadfd769cede5bf86300215ba3fb3dd1
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-06-25 14:24:23 -04:00
xinhui pan 8ee5647814 THUNK: fix deregister memory issues
__fmm_release actually fails to find the object if address is not
pagesize aligned.  And the caller did not notice this as __fmm_release
has no err code return.

So to fix this, move the object lookup in caller, and use vm-object
instead. Also fmm_release will pass up the error code.

Change-Id: Ib8ea1ea5ae844844fd20e8e01f0fdb841d218f2c
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-06-25 14:12:26 -04:00
Yong Zhao 7a8566dc03 Improve the return value for hsaKmtOpenKFD()
When KFD is already opened, Opening it again should return
HSAKMT_STATUS_KERNEL_ALREADY_OPENED to align with the specification.

Change-Id: Ib10a2d2c48781600bea7d072557d03ccb1a2bc19
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
2018-06-11 14:08:57 -04:00
Felix Kuehling 0462744965 Add fallback for GPUVM doorbell mapping
Upstream KFD doesn't support mapping doorbells to GPUVM yet. Fall
back to the old method.

Change-Id: I452a6fc59b88329b833844e3914c480c2f13c82d
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-05-31 13:17:27 -04:00
Felix Kuehling 7495e74257 Cosmetic changes to kfd_ioctl.h
Make it more similar with upstream.

Change-Id: I982ccfd4045d96e3c30bc84d38d0e03db8de9b08
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-05-31 13:17:27 -04:00
Felix Kuehling 571e2cf7e4 Update KFD-Thunk ioctl ABI to match upstream
- Clean up and renumber scratch memory ioctl
- Renumber get_tile_config ioctl
- Renumber set_trap_handler ioctl
- Update KFD_IOC_ALLOC_MEM_FLAGS
- Renumber GPUVM memory management ioctls
- Remove unused SEP_PROCESS_DGPU_APERTURE ioctl
- Update memory management ioctls
    Replace device_ids_array_size (in bytes) with n_devices. Fix error
    handling and use n_success to update device_id arrays in objects.

This commit breaks the ABI and requires a corresponding KFD change.

Change-Id: Ibf0af5a5188e817c886eab388d1533130fc18293
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-05-31 13:17:27 -04:00
Shaoyun Liu 93d07cf916 Thunk: Add gfx906 support on thunk
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>

Conflicts:
	src/topology.c

Change-Id: I692d9295a954d4eda08eba301312014f7b3969cb
2018-05-29 15:38:26 -04:00
Yong Zhao ec440fb428 Stop allocating eop buffer for SDMA queues
Change-Id: I9a4eaee05588292a797eb424503dd7b793c1408c
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
2018-05-16 15:30:23 -04:00
Yong Zhao 43f119bcbc Improve the code readablity
The main point is to move update_ctx_save_restore_size() out of if()
condition.

Change-Id: I58a1a4f3edca2d1c510fdd0e31e59b5c41e92a14
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
2018-05-16 14:55:55 -04:00
Oak Zeng dc1bbccc39 Use svm aperture for device memory allocate for gfx902 and after APU
Change-Id: Ib1d822adde30138a016e010bf581220465a087b9
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2018-05-03 12:03:22 -04:00
Shaoyun Liu aa28484583 Thunk: Add gfx904 support on libthunk
Change-Id: I78bc623f6b86293e2bf9fbe00a646d152faafdc4
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
2018-03-29 18:21:02 -04:00
Felix Kuehling 8ac2150e81 Let KFD use VM from DRM render node
Move opening of DRM render nodes from topology to FMM aperture
initialization. Keep the same FDs open for the life time of the
process to match how KFD uses the VMs in the FDs. Call acquire_vm
ioctl during aperture initialization to let KFD use the VMs from
the render nodes.

Change-Id: Ie07d57788cbe685b1841cccc00820c12894a0356
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-03-20 15:42:45 -04:00
Philip Yang 1bf93d4e89 Export microcode version of sDMA
Change-Id: I86fa5da5e72af13a2e76e6e3be4667a7220923d5
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2018-03-19 08:38:50 -04:00
Felix Kuehling 85e1a9bf5e Rework SVM aperture initialization
Query GPUVM aperture limits of all dGPUs to determine SVM aperture
base and limit. This depends on a recent KFD change that reports
the GPUVM apurture limits for dGPUs in the
AMDKFD_IOC_GET_PROCESS_APERTURES_NEW ioctl (drm/amdkfd: Simplify
dGPU SVM aperture handling).

Only initialize SVM aperture once, instead of once per GPU.

Don't call AMDKFD_IOC_SET_PROCESS_DGPU_APERTURE. It's not needed any
more and will not be upstreamed.

Change-Id: Ib3389e8ba18505ba15fc33f45fe8a57e690a565d
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-03-09 16:36:49 -05:00
Felix Kuehling c5cfb7e25b Move dGPU memory aperture initialization
Define dgpu_mem_init before it's used and keep the code close to the
rest of the aperture initialization code.

Change-Id: I14ad11a364524a15affee9186b1298ba7d56d2c9
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-03-09 15:00:12 -05:00
Philip Yang 105291849f Close shmem file handle, to fix file handle leak
kfdtest hsaKmtOpenKFD failed after 1019 loop if using --gtest_loop=-1,
because default max open file handle limit is 1024. Found shmem file handle
is not closed from lsof output.

Change-Id: I474de2bae6c03e879a219dedf5f18639118b73e5
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2018-02-23 10:50:52 -05:00
Jay Cornwall e2c353dc0d Allocate EOP queue local to GPU
On discrete GPUs place the EOP queue in VRAM. The reader/writer of this
queue is the CP and the size is small. Dispatch latency improves
through lower read latency in AQL completion phase.

Change-Id: Id8351dcddbd21fd7c7d699803c96434c9132db71
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
2018-02-22 18:14:05 -05:00
Oak Zeng 25170c3c57 Support ptrace access invisible vram
Invisible device memory is mmapped as PROT_NONE.
Normal CPU access to the memory is still not allowed but
struct vm_area_struct will be created for the memory address
so ptrace can access the memory via the vma.

Change-Id: I07c69208716c920ccce33e6b494b610b61a0a7c1
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2018-02-20 14:13:00 -05:00
Harish Kasiviswanathan 7de0199e99 CMA: Initialize SizeCopied return parameter
UCX test cases are reporting uninitialized values when CMA fails. The
application should ideally ignore SizeCopied when the function fails but
it doesn't. This is leading to wrong diagnosis.

v2: Fill in partial SizeCopied in case of failure

Change-Id: I6b7e1c19a8b702ec91ca64201a3dda27bd897877
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2018-02-08 12:46:40 -05:00
Yong Zhao 55bb61ff9c Revert "Workaround: make mmap memory resident for gfx902"
This reverts commit 716755b1de.

Change-Id: I9f4f0b6b426aeae4cb652b33cf0d4c0f57270ca5
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
2018-02-02 12:31:06 -05:00
Laurent Morichetti 056ddbbc82 Silence Valgrind warnings
Change-Id: I8803f3d310fccd69d0d04b2464b00dccc40270e3
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-01-25 16:48:17 -05:00
Yong Zhao 716755b1de Workaround: make mmap memory resident for gfx902
Change-Id: I5f90f316740f7995d54cb083a6d7e05bc4e2966e
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
2017-12-14 15:11:01 -05:00
Yong Zhao 0f83774635 Report gfx902 as GFX 9.0.2
This change is needed to match other higher level components.

Change-Id: I45114d23f2ed428dfbbb836061b3020c5ab166ec
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
2017-12-07 16:08:10 -05:00
Oak Zeng c2dc301792 Revert "Revert "More cleanup of fmm.c""
This reverts commit 52f6a61970.

Change-Id: I31afe4889794df8cf1e96f5f18771bed75a213d9
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2017-12-04 15:48:11 -05:00
Oak Zeng 786e470241 Revert "Revert "Cleanup fmm.c""
This reverts commit f7689d4fef,
Plus a bug fix to patch "Cleanup fmm.c":
Call id_in_array with correct parameter. The third parameter
of id_in_array is size in byte of the array, not the number
of array items. Call it correctly.

Change-Id: I72d8e2fcc0df32af76c72967386e92c1be18c159
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2017-12-04 15:48:11 -05:00
Felix Kuehling 587d4f4bdf Rename fmm_allocate_memory_in_device
to fmm_allocate_memory_object. This function name was confusingly
similar to fmm_allocate_device and __fmm_allocate_device. The new name
reflects its function better: allocate the VM object and the kernel
mode buffer object.

Change-Id: I6604d228004b4d41e871d4de784786823608b5d6
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2017-12-04 10:23:55 -05:00
Oak Zeng f7689d4fef Revert "Cleanup fmm.c"
This reverts commit b4c89c1ea7.
This change caused a regression ()
Revert temporarily

Change-Id: Ic3829264151e37d1f8c6927c6f464006234ba17f
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2017-11-29 09:43:11 -05:00
Oak Zeng 52f6a61970 Revert "More cleanup of fmm.c"
This reverts commit 019f7cbd20.
This change caused a regression ()
Revert temporarily

Change-Id: I5af59d319afeb7f0b03e5a09e8397e3853b8b37b
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2017-11-29 09:42:19 -05:00
Oak Zeng cce57cec26 Cosmetic changes in events.c
Change-Id: Idecb8eede8811020b3af51cbc71da74849029c82
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2017-11-28 15:20:51 -05:00
Oak Zeng 019f7cbd20 More cleanup of fmm.c
1. Renamed _fmm_map_to_gpu to _fmm_map_to_apu_local
   to reflect the real semantics of this function
2. Renamed _fmm_map_to_gpu_gtt to _fmm_map_to_gpu
   because this function is used to map both gtt
   and local memory
3. Call _fmm_map_to_gpu in _fmm_map_to_apu_local
   to get rid of duplicated codes

Change-Id: Id8e3ebfffe0a3c27ebdcac8a8f4dc3738d67d10a
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2017-11-27 18:47:35 -05:00
Oak Zeng b4c89c1ea7 Cleanup fmm.c
1. Initialize pointers to NULL in vm_create_and_init_object
2. Added helper function to add/remove device ids to/from mapped arrary
3. Only map nodes that were not mapped currently
4. Remove unnecessary condition check on object frees

Change-Id: I7aed6d40c7464be0d168d5796229af55451e0f34
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2017-11-27 18:47:23 -05:00
Amber Lin 6f7b55f2d8 Add debug message in PMC trace
Print data in PMC trace when the debug level is set to 7(pr_debug).

Change-Id: I9abbb8f6c3f7962fb637528578c1a58b7784042d
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2017-11-22 10:09:49 -05:00
Oak Zeng 061db45fe2 Fix unconditional unmap in fmm_map_to_gpu_nodes
_fmm_unmap_from_gpu is called in fmm_map_to_gpu_nodes
to unmap buffer from nodes that is already mapped to
but not in the new map nodes list. Previously, the unmap
was called unconditionally even though the size of the
array to unmap is 0. This fixes the issue by calling
the unmap func only when the unmap array size is not 0.

Also releases the fmm_mutex on error returns

Change-Id: Iadd8383caf7ebb92f02618798c5efd138a352aaa
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2017-11-21 15:16:39 -05:00
Oak Zeng f06e887725 Properly control lifecycle of ptr info objects
Buffer mapping to devices and buffer registration to
devices can be changed b/t two pointer info queries.
Thus update buffer mapping info and registration info
only when mapping and registration changed. This is
done by free mapped_node_id_array on mapping to new
device and free registered_node_id_array on registration
and re-allocate them on next ptr info query.

Also uses fmm_mutex to avoid race conditions in case
of calling hsaKmtQueryPointerInfo concurrently with
calling of buffer mapping or registration

Change-Id: Ibc2e20be1fc0147066f873dfa44b21f5015104b7
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2017-11-21 15:16:29 -05:00
Oak Zeng 07110fbd38 Correctly handle max_map_count limit after failed memory allocation
Also separated a function for removing CPU mapping
and reserving address, as a refactoring of codes

Change-Id: I1feb85b0b2ec942487f899ec3192c7c47dd7c7d5
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2017-11-08 10:05:04 -05:00
Oak Zeng 68a2d286ca Use drm render device to map kfd BOs
Previously kfd device is used to map memory for CPU access.
However this is not compatible with how TTM handles CPU mapping
on eviction - memory won't be unmapped and remapped on restore.
This fixes the issue by mmapping memory using DRM render device.

This patch requires a coordinated kernel driver change to work.
To make it compatible with old kernel driver, some temporary codes
are included. Once the coordinated kernel driver is checked in,
the temporary codes can be removed.



Change-Id: Ie7b304c4a82b7e8d5ab703acb81d66430af4f0bc
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2017-11-02 09:06:26 -04:00