LinkInfo is already initialized to zero in its default constructor.
Change-Id: Ifa4fb886cce9b474c6879c9c82744044ab394082
[ROCm/ROCR-Runtime commit: 2843988dd7]
Remove fence pool and use two signals. Two signals allows overlapped
submission and copy while reducing thread busy polling.
Change-Id: Idb5f8e4c7f482a596ffce9e7799191fdd785a216
[ROCm/ROCR-Runtime commit: 56ed5c8904]
Fix pitch overflow due to small element detection.
Add wide pitch 2D copy handling.
Cleanup code duplication.
Change-Id: I93b1584aba8e5964957eb7ab3544df806ca3e2f9
[ROCm/ROCR-Runtime commit: e0839ab27e]
Can only check that the signal has some time stamp, can't check if
the translating agent matches the last used agent or not.
Change-Id: I62943a864318808059c617280bb65a269dfadd1b
[ROCm/ROCR-Runtime commit: aca00b7238]
Adds HSA_AMD_SYSTEM_INFO_BUILD_VERSION=0x200 to hsa_system_info_t.
This returns a const char* pointing at the build string (git describe).
Change-Id: I73e6612482bf6ffc4037fd365808eb9211a650ad
[ROCm/ROCR-Runtime commit: cd8e5c1da8]
Adds env flag HSA_REV_COPY_DIR. If set to 1 async copy will
copy from dst device to src device rather than from src to dst.
Change-Id: I3095642066fa026dc112c2eac06db9393341cd7e
[ROCm/ROCR-Runtime commit: 6c47780620]
Conserves VMIDs when multiple processes are in use and memory operations
are not GPU specific. For instance HIP API hipHostMalloc does not accept
a target GPU so when used with one process per GPU (ie GPU == MPI rank) we can
quickly exceed the available VMID slots if every process consumes a VMID on
every GPU.
Change-Id: Ib6fa051290089f71581029c09f9a44b9992237d1
[ROCm/ROCR-Runtime commit: 35a270ef7e]
SDMA will use atomic completion fences if KFD reports 64bit atomic support.
Otherwise it will fall back to store completion fences.
Change-Id: I12b76f8a74ec3ee96372c250f9824d846051536e
[ROCm/ROCR-Runtime commit: 3e3aa37750]
These fixes are needed to find the hsakmt headers and libraries with
an upcoming hsakmt build system cleanup. It should continue to work
with the original hsakmt build system.
Change-Id: I6b3fcea8f2588698c130c9ec50952c66712afa6c
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: 5f25d024a8]
Avoids using non-atomic SDMA fences by default since that path can duplicate fences.
If HSA_ENABLE_SDMA is set this will override copy path selection and may use
non-atomic fences.
Change-Id: I4747e9a766f7f649d21ddf6bfded047ac26fd60e
[ROCm/ROCR-Runtime commit: c593dfc6bf]
llvm.debugtrap and other trap IDs are reserved and should not place
the queue into an error state.
Change-Id: I98193a35ac7da94c4a42ee75d87754ee552ebea0
[ROCm/ROCR-Runtime commit: 536823482b]
Ensure system release fence is set on GFX8 large scratch using packets.
Change-Id: I13cfdcd35969482ea6e95e0b352f5cb3a0454b86
[ROCm/ROCR-Runtime commit: 5f25619bb7]
Use async. signal handler to satisfy dependencies for SDMA blits.
Change-Id: Ifa8d3ee6810509f400a568ca2387ac6ab3ab7c36
[ROCm/ROCR-Runtime commit: 7cd6e366ed]
1/ Revised debug event handler to handle different events.
2/ Added queue error handler using the callback in queue create, which will print out wave info when queue in error state.
3/ Preempt queue instead of destory queue when queue error state.
Change-Id: Ib727d208de9caf1c72c76d42268483b24aaebde8
[ROCm/ROCR-Runtime commit: 49d2175c74]
Also improve small_heap used for scratch region allocation.
Change-Id: Ib7311b663b38968d88ebc355b81e12c0863dc541
[ROCm/ROCR-Runtime commit: 7caf9633f6]
Spec requires GPU release fences and CPU acquire fences at queue destroy.
Also update the recognized status codes.
Change-Id: If9166f5149f65417c7057ff7c0f69f6ac094d6ab
[ROCm/ROCR-Runtime commit: b6f0248f53]
Remove unused function (FenceRelease), add comments to barrier packet settings,
correct profiling controls to work with queue wrappers.
Change-Id: I45bb26227bcc2b78edb8ad5dc497603c33234e18
[ROCm/ROCR-Runtime commit: cd46954cc4]
This includes the changes provided by Konstantin, "Add xnack from elf header" (Change 136389).
Change-Id: I95e51141caa0d7c21903b09212c02e4906ec54a3
[ROCm/ROCR-Runtime commit: 8e3d26c617]