This reverts commit e5b6537315ce9b2688ee0269ba0828a703c3e2c9.
The regressions (SWDEV-459556 and SWDEV-460260) caused by the original patch
has been resolved.
Change-Id: I32344492b4ff88bd7e91ea47983ac15636dc77c1
[ROCm/clr commit: b0930263e5]
* When no GPUs are available, hsa_init fails with HSA_STATUS_ERROR_OUT_OF_RESOURCES, and device and runtime initialization fails. In order for NoGpu tests to pass, true needs to be returned which will cause HIP_INIT_API to return proper error hipErrorNoDevice instead of hipErrorInvalidDevice.
Change-Id: I982d4416c92ed1b36893354d8b10d73df34f2478
[ROCm/clr commit: fdaa7141af]
- Print kernelname for graph launches, its hard to correlate packets
otherwise
- Print correlation_id if any
Change-Id: Ib8db7a00e4e7c98f570e71029e61d86f5dccc2ed
[ROCm/clr commit: 72d23a02c5]
Generate static package by combining binary and dev components.
Binary and dev component dependencies are added to the static package dependencies
Package name will have suffix static-dev/devel
Change-Id: I7eb187ceaf2af7dfaf6ff9f56de20dac72881a12
[ROCm/clr commit: 2ce57184d3]
- Gfx12 TCC cacheline size is 256B, Increase to have alignment
compatible. Eventually this needs to be replaced with what the query
returns.
Change-Id: I545929446c4faa3f26872a6290b3a89657888596
[ROCm/clr commit: bb01b4c3b4]
This reverts commit c0ee0ffa1c.
Reason for revert: <INSERT REASONING HERE>
New comgr unbundling action leads to perf drop for uncompressed code object. Will create a new patch to use old path for uncompressed , new unbundling api for compressed .
Change-Id: I41ef53b71fc9f7aaa8cf231d4d70945f1117db52
[ROCm/clr commit: a1350fe8c1]
- Remove Last graph node optimization and instead submit a barrier NOP
packet always. This simplifies the code.
Change-Id: Ied443173ba47a08b6df148ac7e3ead712acda11c
[ROCm/clr commit: badf2b0880]
Handle GraphExec instance is destroyed before async launch completes
GraphExec instance is destroyed after async launch completes
GraphExec instance is destroyed without a launch
Change-Id: I45a7c82295fea916c7559bd8f796df710513aea1
[ROCm/clr commit: bf4d10ff61]
- No cpu wait is needed when profiler is attached, Doing this changes
the application profile when roctracer is attached.
Change-Id: I2b9cfc48d697cf5ed54bb6a240d8c12bdb079171
[ROCm/clr commit: 51e4368723]
- awaitCompletion would wait for host side command compelete(aka
cpuWait). The correct way is to check the completion signal and if not
dispatch a marker that has a signal.
Change-Id: I0f4f23c7ea68c329bf1d5f05e9735f631e5e3808
[ROCm/clr commit: 2d7912dc01]
On gfx8, gfx9 devices before MI100 and gfx10.0 or gfx10.1
none of the memory ordering workarounds for device kernel arguments
can be applied. Use host kernel arguments on these devices.
Change-Id: I9be6fbfe4b3986eb7d9f83998334df5f03fd4124
[ROCm/clr commit: 2b746de6de]
hsa_amd_ipc_memory_detach is called with an invalid mapped pointer.
Changed to pass the svm pointer of the owner memory instead.
Change-Id: I8203c6e2d718efb8ca3b028309bc78caff8d4c7d
[ROCm/clr commit: 5bb30d7718]
Switch commands creation to the new suballocator to avoid
frequent expensive OS calls
Change-Id: I3597c811820e577c15708bad8b8a41aa53acc400
[ROCm/clr commit: 5b0bfdcbad]
The Readback and Avoid HDP Flush memory ordering workaround is
used as a fallback solution only when HDP flush register is invalid
Change-Id: Ic284eba1f95ed22b0270d3abeb904fb902015b1a
[ROCm/clr commit: 6cb7b6ec6b]
- Add LOG_TS mask for printing signal times
- Read raw ticks from signals
Change-Id: Ibdd0bf06c790729f6c65083a4784c97a3c3219e0
[ROCm/clr commit: 948ca5a931]