- enforce incrementing the table versioning number when a table size changes outside of ifdef for ROCPROFILER_REGISTER
- add new HIP_ENFORCE_ABI entries
- update the HipDispatchTable size and bump HIP_RUNTIME_API_TABLE_STEP_VERSION to 1
- re-enable rocprofiler-register
Change-Id: Ie0cc1d8491c5640056e5dd393ea243e4dce4e8a9
When kernel does device side malloc, initial heap is allocated with __amd_rocclr_initHeap.
During graph launch kernel __amd_rocclr_initHeap is enqueued followed by actual kernel . So kernel will execute after initHeap kernel.
But with graph optimizations during capture initHeap gets enqueued on device null stream and actual kernel on graph launch stream.
So no proper synchronization. Switch to command creation and enqueue during launch for kernel node with hidden heap.
Change-Id: Iaf600251faef9a448853f19429023c118aa760b9
__amd_streamOpsWrite blitkernel in device-libs has only 3 args.
so getting rid of the 4th unused arg (sizeBytes)
Change-Id: I81cc1107f8b424bf58558c93a2495a1b878aef91
With multiple HIP streams it's possible to have a race condition when
one thread stops the traces, but another still performs submisisons.
That may cause a crash on the barrier callback.
Change-Id: Ic56f8277fcfd2c2142a4821d927b938b9f313add
Check the pointer if its present in the arrayset before trying to dereference
it as it can cause access violation if the pointer is allocated using malloc
Change-Id: Ida72b9015dc22269fc1fbe0728e66e3de29fda3d
- Implement workaround to ensure HDP writes are done by writing and
reading the HDP MMIO register.
- Implement the same workaround for graphs, we no longer need sentinel
write/readback
Change-Id: I0d3027b46a1f61131ec62e3c8c669ff5184fa6b2
When graph is Instantiate on device 0 graph and launch on device1 switch to command creation and enqueue during launch.
Change-Id: Ied34dc99b2a776130d1354ed3830c6ccab9068e4
Add @gargrahul, @rakesroy, and @mangupta as CODEOWNERS.
This is for GitHub upstream.
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Change-Id: I563a7cf860cca56ae5fb5e05dcfaf1751b3692e4
Windows path still uses multi threading implementation. Hence, in graphs
all nodes are executed in a queue thread and that requires to manage
mempool in the queue thread. However, the spec allows to destory memory,
allocated in a graph, outside of the graph's execution. That may cause
mempool management to go out of sync.
Change-Id: I0ffb2244b3cb720455ed44d1b3e2487fa8959a77
Latest VidMM can provide free memory available on the system.
Use PAL interface to report free memory on the system instead
of per process.
Change-Id: I0e78b9d340299c16829177a8c5182d21cc353384
Read and write int bytes sentinal value to dev_ptr or PCIE connected devices at the tail end of the kernarg surface.
Change-Id: I993d552ac872b3cd56aef4746c4d1d92c58d38b4
Update cmake minimum requirement on deprecated cmake version (2.8.11) to non-deprecated version (3.5)
Change-Id: Ib76d241babf475a26464e8b12b91d67e48f72b60
Fetching null stream's logic has changed earlier from amd::HostQueue
to hip::Stream. This seem to cause some timing difference between
checking for null stream and creating it due to which issues are
observed in multithreaded applications using default stream.
Change-Id: Ie02365dec537275d23a1d225de9811e2fd3a9c55
If a system has LLVM installed, `find_package` could choose that one
even if we set `HIP_LLVM_ROOT`. `LLVM_ROOT` is ignored because of this
CMake policy is set to `OLD` by default.
Change-Id: I18fa0453afe170c229e92d6ddc386b43eb0c44f6