Although unpinned copies require synchronizations
in HIP, runtime can avoid syncs for H2D copies with
a staging buffer
Change-Id: If2203c6bc0cbd89742823688dc8e89e9acd873b2
Fixes the memory leak with hipExtStreamCreateWithCUMask API.
hsa queues with cumask set are not being reused and created
everytime the API is called, But these queues were not being
destroyed during hipStreamDestroy causing memory leak.
Change-Id: Ibfbe019bbd73604e98eca80461efe53fa64bb701
1.Move global amd::monitor listenerLock before global
class runtime_tear_down as it will be referenced in
~RuntimeTearDown() after main(). It should be freed
later than runtime_tear_down.
2.Update Device::~Device() to SVM free coopHostcallBuffer_
before context_ is released and freed.
Change-Id: I1d21378ff463477d3238d71e5e2a1a7d6b9147ad
If the graph has kernels that does device side allocation, during packet capture, heap is
allocated because heap pointer has to be added to the AQL packet, and initialized during
graph launch.
Handle race with wait when 2 kernels with device heap are enqueued on multiple streams.
Change-Id: I45933b77fcaf7bc8fdf1bc906462e32b5d8d3688
- Update the intra socket weight for partitions within single socket as
it is changed to 13 by the driver.
- Use the PCIe function to distinguish the partitions of the same device
such as TPX mode in gfx942.
Change-Id: I8e64023d44e37c2dbb105cbb343441a48021ba7b
* When no GPUs are available, hsa_init fails with HSA_STATUS_ERROR_OUT_OF_RESOURCES, and device and runtime initialization fails. In order for NoGpu tests to pass, true needs to be returned which will cause HIP_INIT_API to return proper error hipErrorNoDevice instead of hipErrorInvalidDevice.
Change-Id: I982d4416c92ed1b36893354d8b10d73df34f2478
- UUID is Ascii string with a maximum of 21 chars which uniquely identifies a GPU
- Convert set UUID in HIP_VISIBLE_DEVICES to device index internally
- Then use existing device index logic for HIP_VISIBLE_DEVICES
Change-Id: I8cab4fe42459f8209b97f909300789e6e687b9ac
If we are using the mask returned by getLastUsedSdmaEngine() then we
need to apply the SDMA Read/Write mask to it before using with HSA
copy_on_engine API.
Change-Id: I6e5dc6c187eeb3c61ee159e9d2a0fa7b4737c06e
Original logic didn't use pitch because, abstraction layer had
a sysmem copy without pitch. Since extra sysmem copy was
disabled, the code has to accept pitch values from the app.
Change-Id: Ia9fba7b33ddff4e9109b4e63d0d6afa52f501c8f
- Add the new fillBuffer kernel, which allows to launch a limited
number of workgroups for memory fill operation
- Switch fill memory to 16 bytes write by default
- Allow to limit the workgroups with DEBUG_CLR_LIMIT_BLIT_WG
Change-Id: Ibad1822f2d42b2fc71bcfc1917c31409c0623e8e
Move context allocation into Device::init() method to simplify the logic and handle
HIP_VISIBLE_DEVICES properly
Change-Id: I0fc6f37c7ae39bedbdad0290295d6794c66d6c54