Part 1 of 2.
Enables fine grain vram over PCIe based on env flag.
Part 2 will extend to XGMI.
Change-Id: I8ad506e004b398d56d462b0200274eae2293a461
[ROCm/ROCR-Runtime commit: c56d86100b]
hsa_exceptions with empty what() strings will not report in debug builds.
Change-Id: I0d424d3b1d3044808ece1720a460a57d68bf878e
[ROCm/ROCR-Runtime commit: 344d964f9f]
Version is now a fixed string that matches previous internal builds.
This also matches released DEB/RPM builds (but not github versions).
Change-Id: Id4819b9de8c855250aadf1a1cebb187b5c031721
[ROCm/ROCR-Runtime commit: 400304aa10]
Both support dynamic scratch allocation so there is no reason
to preemptively allocate on APUs.
Change-Id: I22eaec01a83a091ee9dc1f594a1a9106e8dd81fc
[ROCm/ROCR-Runtime commit: 65d39cc476]
- Skip symbols that are STB_LOCAL and not STT_AMDGPU_HSA_KERNEL
Change-Id: I68567f58de9bf3f07dbd8020ef63f47667c86367
[ROCm/ROCR-Runtime commit: 8bee6e4976]
Decrease number of iterations and array sizes in some cases.
Change-Id: I1a0a43faa907b28662ff3a44c172950ed7b1500e
[ROCm/ROCR-Runtime commit: 6bca866e6c]
HW has limited bits for wave scratch base address stride. Enforcement
prevents programs with larger than supported scratch allocations from
running and clobbering neighboring scratch space.
Change-Id: I574da888e9d1d5e290a9c0025ba13b5ef9f1e5c0
[ROCm/ROCR-Runtime commit: 8e4177382a]
- Process dynamic relocation even if there is
no symbol associated to it.
Change-Id: Iaefee682ee52f5acda8280e5764e6d5fd992774a
[ROCm/ROCR-Runtime commit: a447d79430]
Also rename blit_agent to region_gpu and add comments to clarify
its role in deprecated region API support rather than to do blits.
Change-Id: I80b1043db2e1c5d40a58fc801eef70a688ea9169
[ROCm/ROCR-Runtime commit: 936ecd1885]
During registration we must not call any function that depends on registered
data as the lists are not yet complete. This includes signal allocation since
allocating shared GPU mapped memory depends on the list of GPUs.
Change-Id: I94d59e847802c546c2a5a0d9f55fe5ac3fd1d878
[ROCm/ROCR-Runtime commit: dda9c17b45]
Delete the runtime object when the last hsa_shut_down occurs.
Change-Id: I2005d52d06702eaef166714fd5e471cc277924db
[ROCm/ROCR-Runtime commit: 9ec37b5103]
Debug agent requires handles to internal queues for single step debugging.
Added tools only API hsa_amd_runtime_queue_create_register for reporting.
hsa_amd_runtime_queue_create_register sets a callback which is invoked
when internal queues are created.
Change-Id: Ia5190ae724fadba686c15f25b2cd085350eeff0e
[ROCm/ROCR-Runtime commit: 757502ccd6]
Required for debug agent requires copy API and trap handler to be initalized
prior to loading. Existing tools do not make use of internal queue or scratch
memory intercept which is what PostToolsInit allows.
PostToolsInit() will be removed in a following cleanup change.
Change-Id: If43377843808e3eff0defd9204910a67a852902f
[ROCm/ROCR-Runtime commit: 5975c465ad]
Apertures now overlap with the change to 48bit addressing which
precludes using aperture checks to discover buffer ownership.
Switches to ptrinfo to decide which device a buffer owned by.
This corrects faults in the legacy hsa_memory_copy api.
Change-Id: I5c7ce0216e1cdc96f836fc6fec9c3defdf4b9d90
[ROCm/ROCR-Runtime commit: 1e0d690948]
On update, the removal will occur AFTER the new package is installed,
due to some stupidity with how yum/rpm does things. Only remove it if
we're doing a pure uninstall
Change-Id: I4982610828d8bc1f2d8691b1e4ee1718c89413cc
[ROCm/ROCR-Runtime commit: ed9baefd75]
LinkInfo is already initialized to zero in its default constructor.
Change-Id: Ifa4fb886cce9b474c6879c9c82744044ab394082
[ROCm/ROCR-Runtime commit: 2843988dd7]
Remove fence pool and use two signals. Two signals allows overlapped
submission and copy while reducing thread busy polling.
Change-Id: Idb5f8e4c7f482a596ffce9e7799191fdd785a216
[ROCm/ROCR-Runtime commit: 56ed5c8904]
Fix pitch overflow due to small element detection.
Add wide pitch 2D copy handling.
Cleanup code duplication.
Change-Id: I93b1584aba8e5964957eb7ab3544df806ca3e2f9
[ROCm/ROCR-Runtime commit: e0839ab27e]
Can only check that the signal has some time stamp, can't check if
the translating agent matches the last used agent or not.
Change-Id: I62943a864318808059c617280bb65a269dfadd1b
[ROCm/ROCR-Runtime commit: aca00b7238]
Adds HSA_AMD_SYSTEM_INFO_BUILD_VERSION=0x200 to hsa_system_info_t.
This returns a const char* pointing at the build string (git describe).
Change-Id: I73e6612482bf6ffc4037fd365808eb9211a650ad
[ROCm/ROCR-Runtime commit: cd8e5c1da8]
Adds env flag HSA_REV_COPY_DIR. If set to 1 async copy will
copy from dst device to src device rather than from src to dst.
Change-Id: I3095642066fa026dc112c2eac06db9393341cd7e
[ROCm/ROCR-Runtime commit: 6c47780620]
Conserves VMIDs when multiple processes are in use and memory operations
are not GPU specific. For instance HIP API hipHostMalloc does not accept
a target GPU so when used with one process per GPU (ie GPU == MPI rank) we can
quickly exceed the available VMID slots if every process consumes a VMID on
every GPU.
Change-Id: Ib6fa051290089f71581029c09f9a44b9992237d1
[ROCm/ROCR-Runtime commit: 35a270ef7e]