1.Move global amd::monitor listenerLock before global
class runtime_tear_down as it will be referenced in
~RuntimeTearDown() after main(). It should be freed
later than runtime_tear_down.
2.Update Device::~Device() to SVM free coopHostcallBuffer_
before context_ is released and freed.
Change-Id: I1d21378ff463477d3238d71e5e2a1a7d6b9147ad
[ROCm/clr commit: 544c45364f]
If the graph has kernels that does device side allocation, during packet capture, heap is
allocated because heap pointer has to be added to the AQL packet, and initialized during
graph launch.
Handle race with wait when 2 kernels with device heap are enqueued on multiple streams.
Change-Id: I45933b77fcaf7bc8fdf1bc906462e32b5d8d3688
[ROCm/clr commit: 57156c524d]
- Update the intra socket weight for partitions within single socket as
it is changed to 13 by the driver.
- Use the PCIe function to distinguish the partitions of the same device
such as TPX mode in gfx942.
Change-Id: I8e64023d44e37c2dbb105cbb343441a48021ba7b
[ROCm/clr commit: 1815fc808d]
These are missed for gfx1150/1.
Change-Id: I03d997e451d15a01a961e6597f805f634e5c3ae7
Signed-off-by: Lang Yu <lang.yu@amd.com>
[ROCm/clr commit: a0127c9eea]
* When no GPUs are available, hsa_init fails with HSA_STATUS_ERROR_OUT_OF_RESOURCES, and device and runtime initialization fails. In order for NoGpu tests to pass, true needs to be returned which will cause HIP_INIT_API to return proper error hipErrorNoDevice instead of hipErrorInvalidDevice.
Change-Id: I982d4416c92ed1b36893354d8b10d73df34f2478
[ROCm/clr commit: fdaa7141af]
- UUID is Ascii string with a maximum of 21 chars which uniquely identifies a GPU
- Convert set UUID in HIP_VISIBLE_DEVICES to device index internally
- Then use existing device index logic for HIP_VISIBLE_DEVICES
Change-Id: I8cab4fe42459f8209b97f909300789e6e687b9ac
[ROCm/clr commit: 52db98edd9]
If we are using the mask returned by getLastUsedSdmaEngine() then we
need to apply the SDMA Read/Write mask to it before using with HSA
copy_on_engine API.
Change-Id: I6e5dc6c187eeb3c61ee159e9d2a0fa7b4737c06e
[ROCm/clr commit: 3f0bcf7834]
- Print SWq for AQL packets, this helps correlating a stream to the HWq
mapped
Change-Id: I610430c0872a1abc6636027c00163ec46983cd65
[ROCm/clr commit: 984c86f407]
Display queue base pointer in the log. This can be co-related with AQL
packets
Change-Id: I544f9b6db6ae01c85e57e4b3f0b3fffefcd7c2ed
[ROCm/clr commit: 0567c3b720]
Original logic didn't use pitch because, abstraction layer had
a sysmem copy without pitch. Since extra sysmem copy was
disabled, the code has to accept pitch values from the app.
Change-Id: Ia9fba7b33ddff4e9109b4e63d0d6afa52f501c8f
[ROCm/clr commit: fb3dfcf889]
- Add the new fillBuffer kernel, which allows to launch a limited
number of workgroups for memory fill operation
- Switch fill memory to 16 bytes write by default
- Allow to limit the workgroups with DEBUG_CLR_LIMIT_BLIT_WG
Change-Id: Ibad1822f2d42b2fc71bcfc1917c31409c0623e8e
[ROCm/clr commit: f1dc81f427]
Move context allocation into Device::init() method to simplify the logic and handle
HIP_VISIBLE_DEVICES properly
Change-Id: I0fc6f37c7ae39bedbdad0290295d6794c66d6c54
[ROCm/clr commit: a49d633883]
- Address corner cases that can arise with the new
hipMemcpyDeviceToDeviceNoCU enum
- Better log
Change-Id: I6035b901f8d616741054b7a5ff4f67956329ac57
[ROCm/clr commit: 5662d4037c]
- alias hipGetDeviceProperties to hipGetDevicePropertiesR0600
- alias hipDeviceProp_t to hipDeviceProp_tR0600
- remove gcnArch from new device property struct
- add new requested struct members
Change-Id: If3f5dbef3d608487d9f6f419285f4bf577ea9bf0
[ROCm/clr commit: 2989840511]
If cl_khr_fp16 extension is enabled, then OCL runtime should report CL_DEVICE_HALF_FP_CONFIG.
Change-Id: I7c4ac48387f80bc704a475c57e5b52a462090d1b
[ROCm/clr commit: ad2c1e899a]