Version is now a fixed string that matches previous internal builds.
This also matches released DEB/RPM builds (but not github versions).
Change-Id: Id4819b9de8c855250aadf1a1cebb187b5c031721
HW has limited bits for wave scratch base address stride. Enforcement
prevents programs with larger than supported scratch allocations from
running and clobbering neighboring scratch space.
Change-Id: I574da888e9d1d5e290a9c0025ba13b5ef9f1e5c0
Also rename blit_agent to region_gpu and add comments to clarify
its role in deprecated region API support rather than to do blits.
Change-Id: I80b1043db2e1c5d40a58fc801eef70a688ea9169
During registration we must not call any function that depends on registered
data as the lists are not yet complete. This includes signal allocation since
allocating shared GPU mapped memory depends on the list of GPUs.
Change-Id: I94d59e847802c546c2a5a0d9f55fe5ac3fd1d878
Debug agent requires handles to internal queues for single step debugging.
Added tools only API hsa_amd_runtime_queue_create_register for reporting.
hsa_amd_runtime_queue_create_register sets a callback which is invoked
when internal queues are created.
Change-Id: Ia5190ae724fadba686c15f25b2cd085350eeff0e
Required for debug agent requires copy API and trap handler to be initalized
prior to loading. Existing tools do not make use of internal queue or scratch
memory intercept which is what PostToolsInit allows.
PostToolsInit() will be removed in a following cleanup change.
Change-Id: If43377843808e3eff0defd9204910a67a852902f
Apertures now overlap with the change to 48bit addressing which
precludes using aperture checks to discover buffer ownership.
Switches to ptrinfo to decide which device a buffer owned by.
This corrects faults in the legacy hsa_memory_copy api.
Change-Id: I5c7ce0216e1cdc96f836fc6fec9c3defdf4b9d90
On update, the removal will occur AFTER the new package is installed,
due to some stupidity with how yum/rpm does things. Only remove it if
we're doing a pure uninstall
Change-Id: I4982610828d8bc1f2d8691b1e4ee1718c89413cc
Remove fence pool and use two signals. Two signals allows overlapped
submission and copy while reducing thread busy polling.
Change-Id: Idb5f8e4c7f482a596ffce9e7799191fdd785a216
Fix pitch overflow due to small element detection.
Add wide pitch 2D copy handling.
Cleanup code duplication.
Change-Id: I93b1584aba8e5964957eb7ab3544df806ca3e2f9
Can only check that the signal has some time stamp, can't check if
the translating agent matches the last used agent or not.
Change-Id: I62943a864318808059c617280bb65a269dfadd1b
Adds HSA_AMD_SYSTEM_INFO_BUILD_VERSION=0x200 to hsa_system_info_t.
This returns a const char* pointing at the build string (git describe).
Change-Id: I73e6612482bf6ffc4037fd365808eb9211a650ad
Adds env flag HSA_REV_COPY_DIR. If set to 1 async copy will
copy from dst device to src device rather than from src to dst.
Change-Id: I3095642066fa026dc112c2eac06db9393341cd7e
Conserves VMIDs when multiple processes are in use and memory operations
are not GPU specific. For instance HIP API hipHostMalloc does not accept
a target GPU so when used with one process per GPU (ie GPU == MPI rank) we can
quickly exceed the available VMID slots if every process consumes a VMID on
every GPU.
Change-Id: Ib6fa051290089f71581029c09f9a44b9992237d1
SDMA will use atomic completion fences if KFD reports 64bit atomic support.
Otherwise it will fall back to store completion fences.
Change-Id: I12b76f8a74ec3ee96372c250f9824d846051536e