Impacts GPU_ONLY signal type latency when waiting for small operations.
Using this type improves total SDMA small copy performance by ~40% if
the signal is allowed to spin freely.
Change-Id: I27aa128c63a1bacb3f51fb08f166e4e1d6fef651
[ROCm/ROCR-Runtime commit: 5adb73fffd]
Remove agent lookup in time stamp translation for IPC signals. The copy
agent handle is not shared so does not need to be checked for cross
process use. Cross process copy-timestamp read is illegal and continues
to deliver garbage.
Store the copy agent properly when doing CPU-CPU copies.
Change-Id: Ib4008f66ff866922047749dd556c84a32021c1fd
[ROCm/ROCR-Runtime commit: ea8c99f452]
ucode versions are per asic so not valid for feature enablement outside
of bringup/dev. Feature is older than the latest ioctl change that
the thunk depends on so use of this patch with kernel packages that
don't contain the feature is not possible in a supported environment.
Change-Id: I36b14176a7d642017ef1518aeade454b0f3dc749
[ROCm/ROCR-Runtime commit: 8133563a93]
Also removed an unnecessary cache flush in dependency barrier packet.
Change-Id: I573df3bdf0a10df0bcd78025672c44038f8091ff
[ROCm/ROCR-Runtime commit: 4647a5454d]
snprintf throws a warning from -Wformat-truncation where the string
could be truncated. We address this by referencing the maximum size that
can be returned from a file according to MAXNAMLEN . This should safely
guard us from truncating the path value.
Change-Id: If1d208990d8775e9494835b0deb890d2616fd15b
Signed-off-by: Kent Russell <kent.russell@amd.com>
[ROCm/ROCR-Runtime commit: ccac07cb14]
Replace cpuid call with sysfs data to get CPU cache information. With this
change, x86 check is also removed since sysfs applies to other platforms.
CPU cache information can be retrieved from
/sys/devices/system/node/nodeX/cpuY/cache where Y is processor number
represented in /proc/cpuinfo at "processor" entry.
Change-Id: Ic47df6d5dafaf1aae5b46b1fdee42691c697e49e
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: 4fa930af5a]
This is to allow allocations in system memory that exceed sizes
reported by a CPU device
Change-Id: I3d10d192aafcefbe4107f69b7c5e30bf7f836619
[ROCm/ROCR-Runtime commit: 3201f68f72]
PCI domain has moved to 32-bits to accommodate virtualization,
so a 32-bit integer is exposed for domain to reflect this change.
Change-Id: I0d767acadcdc8e4277db203b5865dd67dd001cef
Signed-off-by: Ori Messinger <ori.messinger@amd.com>
[ROCm/ROCR-Runtime commit: f2173254e4]
enable thunk query api to report if queue is newly created
test new queue bit test on clear events.
also fixup cleanup to disable debug trap.
Change-Id: I3ebe2d85da66f28b8c82f0e68461ee7d32ec0b0d
Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: Philip Cox <Philip.Cox@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: 1ff5cb33b2]
Originally reserved 100 SGPRs per wave. Pre-gfx10 needs 102 SGPRs
and gfx10 needs 128 SGPRs. Reserve 128 SGPRs per wave for all ASICs
to simplify calculation.
Also double VGPR register size for gfx908 family
Change-Id: I98b741cbfa051f49ed37ff25d99f851f124be7b6
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
[ROCm/ROCR-Runtime commit: 814e0f0bdc]
This saves us from maintaining device ID to Asic mapping in the scripts.
Moreover, stop using abbrevation asic names to avoid confusion.
Change-Id: I7ce583b26b09b627c142aae41932483b28c545d8
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
[ROCm/ROCR-Runtime commit: f1c0bc8e35]
By using user-allocated pages instead of kernel-allocated pages
from TTM, we're not subject to TTM's self imposed limits on kernel
memory usage. This also paves the way for for more wide-spread use
of HMM on upstream kernels.
Change-Id: Iac82964c98a441e29b7f1986d1be1bb5ccb1e569
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: 626957a263]
child process clone vm objects from svm->apertures if parent process
doesn't free memory before fork. fmm_clear_all_mem suppose to clear the
apertures in forked child process but this only works if gpu_vm is not
NULL. parent process call hsaKmtCloseKFD reset gpu_vm to NULL and then
fork, then child process will not clear svm->apertures.
As a result, the child process will allocate vm object with same address
and add to aperture, there are duplicate vm objects with same address
in aperture. Then mapping to GPU will find the wrong vm object and
create incorrect GPU mapping cause rocrtst IPC test VM fault. The issue
happened with HSA_USERPTR_FOR_PAGED_MEM=1.
The fix is to clear vm objects in all apertures in clear_after_fork.
Change-Id: I92e42a967075a634a3f475b915c8242d82077ecb
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: a11cb2a633]
This reverts commit bb4291dcac.
This change causes KFDTest failed on gfx803. The first hsaKmtCreateEvent
call allocate system memory for events_page because global variable
events_page is NULL. And this events page vm address should not be freed
until the process exit.
The change to destrory objects in hsaKmtCloseKFD removes events page. As
a result, KFDTest call hsaKmtOpenKFD again and then allocate memory will
get same events_page vm address on gfx803, and map this vm failed because
the vm conflict with events_page mapping.
KFDTest passed on VG10, gfx906 because allocate memory get different vm
address. hsaKmtCreateEvent still works fine as the driver keeps the
events page mapping of the process.
We should only destroy objects in fork cloned child process regardless
if gpu_vm is NULL or not.
Change-Id: I174ef65321cbd6074c855c2021318fe961c8c72c
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: 7fc6a9f7c2]
/proc/cpuinfo are opened, read, and closed multiple times. Once for vendor
name and multiple times for model name -- each node opens once. For example
in a 2 CPUs + 4 GPUs system, it'll be opened 7 times. This patch reads it
one time and stores it in a cpuinfo buffer. This cpuinfo buffer is freed
when the snapshot is done.
Also replace returns with gotos inside the snapshot to avoid possible
memory leak.
Change-Id: Iaf26a6c7e7323a8651d137c3706179449b9e3c80
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: 1fddfd316a]
adding test cases to query debugger pending events
Change-Id: I089754c508e476ce7b19e1cbd84235e4474b30c4
Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: Kent Russell <Kent.Russell@amd.com>
[ROCm/ROCR-Runtime commit: c04ebb56b9]
If M0[23] is set then the driver will interpret the interrupt as a
debug event, rather than a signal event.
Clear M0 before sending the interrupt. All paths here are terminal so
it's not necessary to save/restore M0.
Change-Id: Ibd85b8cc6f8556941f2308a2c3fa3c68702cd606
[ROCm/ROCR-Runtime commit: ad717d2e98]
Reserve half of dma32 zone for non-NUMA system.
Change-Id: Id7aea7b6ff6cc1cc7983ecd95f8078b7f1be630c
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
[ROCm/ROCR-Runtime commit: cdc10991a9]
Decimal is better than hex in this case.
Change-Id: Ic15a9373e99160880b98d3dcd6827d551c87b77a
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
[ROCm/ROCR-Runtime commit: 3e4c42ef13]
Add data out for enable trap to return poll fd to user space.
Add query debug events interface.
Change-Id: Ia4afde1cf167e6aa61d502380a8b329ee89d5f44
Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
[ROCm/ROCR-Runtime commit: 836dfd0752]
Otherwise the parent call hsaKmtCloseKFD and then fork child process,
child process will duplicate the vm_objects from the parent.
Change-Id: Ia6ffc51cbae983b6a7cdc58ccf3b11ebe4087d97
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: 632ad3a749]
The NodeId parameter is redundant and can be retrieved
from QueueId parameter.
Change-Id: I12853849b868b304bd27633fa7653ba644d69026
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
[ROCm/ROCR-Runtime commit: f40f166e20]
Only run the P2P test over the specified source and destination nodes if user already specify them
Change-Id: Ia3c0195cead7f46e3e28507f3255d8c59a287ab8
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
[ROCm/ROCR-Runtime commit: e65685df19]
On GFX10, the wave size is determined by the COMPUTE_DISPATCH_INITIATOR value passed to
DISPATCH_DIRECT.CS_W32_EN, default 0 value was giving 64 lane waves
Change-Id: Ie8c407a24bd2825757ec481be62247b35047e5ca
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
[ROCm/ROCR-Runtime commit: 65ab296840]
We should use the SE number reflected by NumShaderBanks of the node
rather than hardcoding it.
Change-Id: I945fb001f81ce506249cf485a7ce25aee8219bc7
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
[ROCm/ROCR-Runtime commit: 5ae5854302]
Ignore requests to bind to invalid NUMA nodes. This affects only
legacy applications (such as KFDTest) that allocate system memory
as paged memory with a GPU node ID.
Change-Id: I81e514af6d0c1ab2ed5229adeeca1fa0ab2a0685
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: 4d7b0990e4]
Addressed by:
ae92e8f kfdtest: increase BigBufStressTest timeout and avoid VM fault
3edf77b kfdtest: avoid BigBufStressTest run on NUMA node 0
Change-Id: If21c6e42b4cf6aada1f74e77f0d8d1a2fdebcdb8
Signed-off-by: Cole Nelson <cole.nelson@amd.com>
[ROCm/ROCR-Runtime commit: 20cd954fe8]
Remove the CHIP name from the shader ISA and add wave_size(32) to make the same
shader can be used for both GFX9 and GFX10
Change-Id: I16ea72f87980c3d9c11298e20c06a0a073fe9a28
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
[ROCm/ROCR-Runtime commit: 78e754ca5b]
KFDGraphicsInterop.RegisterForeignDeviceMem looks like it is running
now. Re-enable it for kfdtest for all platforms.
Change-Id: I6f6ee9cd11da793c5d525d8676bfc6d5bd8007bb
Signed-off-by: Philip Cox <Philip.Cox@amd.com>
[ROCm/ROCR-Runtime commit: b6f6d9da1c]
v_add_u32 was removed from gfx10, use carry-out explicit instruction
v_add_co_u32 instead on both gfx9 and gfx10
Change-Id: I1fcd5956844457a676757ad13bdce7f5304bb34b
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
[ROCm/ROCR-Runtime commit: c0663be7e8]
Modified shaders in KFD memory test to support gfx10.
There is no gprs register for flat_scratch on gfx10.
Use s_setreg_b32 instruction to set flat scratch base
address register
Change-Id: I505156a046056b61ce2d873343feb50ce635274a
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
[ROCm/ROCR-Runtime commit: 85a9821519]
The test is viable still on VG10/20. Phil is investigating why it takes
so long on gfx803
Change-Id: I61669b29dc0e8407858a5c73cfa69c5ea923846f
[ROCm/ROCR-Runtime commit: 79a3995816]
This functionality doesn't work on GFX9+, and was disabled for gfx802.
Remove the test altogether for now, especially since some kernel changes
broke it on gfx803, and the functionality is deprecated now anyways. Leave
the code for reference, but "#if 0" it to prevent it from compiling or
being in the kfdtest binary
Change-Id: I848b4f23201f18612cbdc122a5b46e4010c4af2a
[ROCm/ROCR-Runtime commit: 1ca1825b84]
Includes
-"Report SRAM ECC errors through the system event handler."
(bf6af52892b1f677697309a7946f651bfe8e9061)
-"Fix async memory test; temporarily disable NUMA memory test"
(8f1a9494cd)
-Temporary disable of rocrtst
Change-Id: Ide9fc999b01ab810f00a56fc3733d07be45117c7
[ROCm/ROCR-Runtime commit: 04da198a31]