Add 5 different test scenario to cover new event age tracking features.
Change-Id: Icab43240fd127208b18abbd7542d6444127ef0c7
Signed-off-by: James Zhu <James.Zhu@amd.com>
Keeping last signaled event age to avoid race conditions
for HSA_EVENTTYPE_SIGNAL when event age init value is non-zero.
Change-Id: Ifb9a11a6868e5762a9f92f579e45a0a2c8fa1017
Signed-off-by: James Zhu <James.Zhu@amd.com>
status.priv may be read after returning from the trap handler, which
causes sq_interrupt_word_wave.priv to be 0 even though the s_sendmsg
instruction was initiated when status.priv was 1.
To work around this, added a s_waitcnt lgkmcnt(0) after s_sendmsg
to make sure the message is sent before continuing.
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Laurent Morichetti <Laurent.Morichetti@amd.com>
Change-Id: Ieb75005ca1559ef03d0efac80e966f521e41fcb7
The purpose of this patch is to fix a minor typo in KFDSVMRangeTest.
Before:
"Skipping test: no enough system memory."
After:
"Skipping test: Not enough system memory."
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I247cb558a177a1d25c393bf16c7386f4d79d0fba
KFDQMTest.MultipleCpQueuesStressDispatch is fixed as of MES SCHQ version
0x3c ().
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I437f3eb5f12dc159339a9b7c7cff2e2b8214ad7c
Compiler behavior is undefined if the right operand is negative,
or greater than or equal to the width of the promoted left operand.
For release builds with address sanitizer enabled, this compiler
optimization behavior leads to unsupported queue size value since
current method shifts till 128 bits on a 64 bit value.
Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com>
Change-Id: Iafdc82d0dfb7f79e3012fb7bb70eda80e4b7a7a6
Removing this definition as this should already be defined by compiler.
This is causing compile errors on newer versions of llvm because the
macro is being redefined.
Change-Id: Ica6a06f46a14e16d3f52e83b9b5ee8cfd7359510
A patch was made in gfx940 npi branch to move the kernel object file
loading to outside the rocrtstNeg.Queue_Validation_* main queue creation
and submission loops, and added a clear_code_object() after the loop.
Another patch was made to the non-npi branch which adds a
clear_code_object() inside the loop. When the npi branch patch was
merged, this was causing the code object to be cleared at the end of
the first loop. Remove these clear_code_object() calls.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Id4188e78411e81c5071bf715c1f02491f571ab79
If we end up in the first if clause, aperture_base is not set, unlike
the other 2 clauses. Initialize it to NULL at declaration time, and only
change its value in the final else clause, where we set it to
aperture->base
Change-Id: I2bf44dc93cae8a03e66f41cedd85d57be2115bba
Signed-off-by: Kent Russell <kent.russell@amd.com>
Allow hsaKmtRegisterGraphicsHandleToNodes parameters NodeArray be null
and NumberOfNodes be zero at same time. It is the case we want the imported
buffer not be registered by kfd. Set gpu_id_array = NULL explicitly to avoid
free uninitialized gpuid array.
Report: Yat Sin, David<David.YatSin@amd.com>
Signed-off-by: Xiaogang Chen<Xiaogang.Chen@amd.com>
Change-Id: I3babc1160c9573e38dd11d81965c8de2b70cae2e
Have hsaKmtMapMemoryToGPU return same value as fmm_map_to_gpu to keep consistency.
Signed-off-by: Xiaogang Chen<Xiaogang.Chen@amd.com>
Change-Id: Ifabb72301e1d5a6c1310973bb1321714e12a1fa6
Query render node fds that libdrm uses for current process and
use them at Thunk if available.
v2: avoid naming conflict with amdgpu_device_get_fd from amdgpu.h
Signed-off-by: Xiaogang Chen<Xiaogang.Chen@amd.com>
Change-Id: Id7288c03730f4a4c9c3644e37ca4725fec71a471
Return GPU NodeId that exported the DMA buffer from amdgpu graphic driver
at fmm_register_graphics_handle.
Signed-off-by: Xiaogang Chen<Xiaogang.Chen@amd.com>
Change-Id: Iaeccce6e6d0b7e27f10b15ed89d1b5310d03d44b
When gpu map info is not provided import DMABuf without VA assigned.
Signed-off-by: Xiaogang Chen<Xiaogang.Chen@amd.com>
Change-Id: I996ab4eb46977af5064126529c28a8bf20a67292
Alloc vram by kfd, then map by GEM api to GPU VM and map to CPU VM.
Signed-off-by: Xiaogang Chen<Xiaogang.Chen@amd.com>
Change-Id: Ib5b2f35662cd5473f622f6ffc9b62925fe57ae42
This new manageable_aperture_t is used for VRAM allocation-only and
VA allocation-only.
Signed-off-by: Xiaogang Chen<Xiaogang.Chen@amd.com>
Change-Id: I3866ef9d35386d6aef7b6934ac8d4a89ef843b50
This reverts commit fd48f14ceb.
Current amdgpu exposes one render node for one gpu node/partition,
revert to previous way to open render node at Thunk.
Signed-off-by: Xiaogang Chen<Xiaogang.Chen@amd.com>
Change-Id: I436be74f8e872a7ab5c4a1420b4ea884f5a00e57
Add parameterization for KFDSVM tests so that we test with both XNACK
enabled and XNACK disabled. This will be overridden by HSA_XNACK, if set
Change-Id: Ie96eb61c03115f947e08cfa076ac459f7440f5d8
Throw runtime error instead of returning empty string when open() fails
in LocateKernelFile()
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Iafa360fbc2d3c9b01b9fe7ea4c11d70bd254ccce
Adding support for gfx941 and gfx942 ISAs.
gfx940 ISA will use sc0:1 sc1:1 on load/store operations
gfx942 ISA will use default load/store operations
Change-Id: If1efbef86f59e2cf2d48fe359cd4166405a0a579
When compiling in ASAN mode, remap the first page of device allocations
to system memory. ASAN's memory allocator uses a small amount of extra
memory to store data for housekeeping purpose. But because this memory
is from the GPU memory pool, it might have uncommon memory type for host
to access. Mapping this section of memory to the host makes this memory
accessible to ASAN.
Change-Id: I36f659d616a4d15558372592439a8723c5c84a69
Signed-off-by: Bing Ma <Bing.Ma@amd.com>
Add support for HSA_ENABLE_PEER_SDMA env variable that can be used to
disable use of SDMA engines for device-to-device transfers. Note that
setting HSA_ENABLE_SDMA=0 will disable all SDMA transfers and override
HSA_ENABLE_PEER_SDMA values.
Change-Id: I737b3c2b2efcf3ff237f98bc748f49b8252ed24a
For aqua_vanjaram APU mode, KFDEvictTest and KFDSVMEvictTest are
skipped. Those tests passed on dGPU mode with memory reporting partition
support on GFX 9.4.3.
Change-Id: I56357843c6743b01b807359dbb37b32391fd9a25
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Add support functions to remap the first page of device memory (GPU/GTT)
to share host ASAN logic.
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Change-Id: I4c27d5417ba80a172dccb0a079a597c5dc1c8f85
Update documentation for hsa_amd_pointer_info to clarify which fields
are invalid when the allocation type is HSA_EXT_POINTER_TYPE_UNKNOWN.
Change-Id: Idaed985962c4a98d281ebe01bef8ec2459da3985
Some workloads running on multi-GPU create 1 process per GPU. So each
process creates a GPU agent on every GPU, but will only create queues on
one GPU. This would cause un-necessary scratch reservation.
Change-Id: I50a216f0bcc0b5f707f3943147390b0ecec1ac22
If the required scratch allocation is too large, ROCr will attempt to
reduce it by lowering the dispatch's targeted occupancy. The reduction
loop however was prone to overflow if waves_per_cu was not a multiple of
waves_per_group. Ensure no overflow by aligning waves_per_cu to
waves_per_group.
On GC 9.4.3 dGPU, dispatches with a large grid size and a
waves_per_group of e.g. 16 may require to reduce occupancy such that
waves_per_cu is less than waves_per_group to ensure the allocation size
is small enough. Allow this while also ensuring the tmpring scratch wave
count is kept divisible by the number of SEs per XCC.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Ie4016dcd8166a9ae69e9decc26a3eec882b49480
When we merge thunk into ROCr, kfdtest will be in a different folder
structure. Add the new location to ensure that we can build now and in
the future with no disruptions
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I6517e061cb0da7137d903abbc380bfc7126f40d4
Scratch cache reserved memory is only available for scratch memory use
so do not report this memory as available to the user via the
HSA_AMD_AGENT_INFO_MEMORY_AVAIL api.
Change-Id: I52f96e62536458bcaa52b9f4be5de856d5680dc4
Starting with GFX11, wptr BOs must be mapped to GART for MES to determine work
on unmapped queues for usermode queue oversubscription (no aggregated doorbell)
Change-Id: I10e30fdc2bec587cef9427faa4874957988c34b3
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
If MES is enabled, wptr has to be non paged memory,
Add an API to check this condition.
Change-Id: I53af1f6687d5332d102e7062c3d760e33b96e722
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>