- enforce incrementing the table versioning number when a table size changes outside of ifdef for ROCPROFILER_REGISTER
- add new HIP_ENFORCE_ABI entries
- update the HipDispatchTable size and bump HIP_RUNTIME_API_TABLE_STEP_VERSION to 1
- re-enable rocprofiler-register
Change-Id: Ie0cc1d8491c5640056e5dd393ea243e4dce4e8a9
When kernel does device side malloc, initial heap is allocated with __amd_rocclr_initHeap.
During graph launch kernel __amd_rocclr_initHeap is enqueued followed by actual kernel . So kernel will execute after initHeap kernel.
But with graph optimizations during capture initHeap gets enqueued on device null stream and actual kernel on graph launch stream.
So no proper synchronization. Switch to command creation and enqueue during launch for kernel node with hidden heap.
Change-Id: Iaf600251faef9a448853f19429023c118aa760b9
Check the pointer if its present in the arrayset before trying to dereference
it as it can cause access violation if the pointer is allocated using malloc
Change-Id: Ida72b9015dc22269fc1fbe0728e66e3de29fda3d
- Implement workaround to ensure HDP writes are done by writing and
reading the HDP MMIO register.
- Implement the same workaround for graphs, we no longer need sentinel
write/readback
Change-Id: I0d3027b46a1f61131ec62e3c8c669ff5184fa6b2
When graph is Instantiate on device 0 graph and launch on device1 switch to command creation and enqueue during launch.
Change-Id: Ied34dc99b2a776130d1354ed3830c6ccab9068e4
Windows path still uses multi threading implementation. Hence, in graphs
all nodes are executed in a queue thread and that requires to manage
mempool in the queue thread. However, the spec allows to destory memory,
allocated in a graph, outside of the graph's execution. That may cause
mempool management to go out of sync.
Change-Id: I0ffb2244b3cb720455ed44d1b3e2487fa8959a77
Read and write int bytes sentinal value to dev_ptr or PCIE connected devices at the tail end of the kernarg surface.
Change-Id: I993d552ac872b3cd56aef4746c4d1d92c58d38b4
Fetching null stream's logic has changed earlier from amd::HostQueue
to hip::Stream. This seem to cause some timing difference between
checking for null stream and creating it due to which issues are
observed in multithreaded applications using default stream.
Change-Id: Ie02365dec537275d23a1d225de9811e2fd3a9c55
If a system has LLVM installed, `find_package` could choose that one
even if we set `HIP_LLVM_ROOT`. `LLVM_ROOT` is ignored because of this
CMake policy is set to `OLD` by default.
Change-Id: I18fa0453afe170c229e92d6ddc386b43eb0c44f6
During hipGraphExecKernelNodeSetParams kernel function can also be updated.
Hence size required for kernel parameters differs from what is allocated during graphInstantiation.
So, create new 128KB kernel pool and allocate kernel args from the pool.
If the pool is full create new 128KB pool. Release kernel pools when graph exec object is destroyed.
Change-Id: I9567946d63400c79cbfd4c5439c654c92557ceae
use AMD_COMGR_ACTION_COMPILE_SOURCE_TO_RELOCATABLE action
to compile source to realoc. Currently we have source->bc,
link->bc and bc->realoc. This new action replaces the
three steps with one.
Change-Id: I8089cbef681e079702fefc2d2085a23bc3578d02
The precompiled header files have hard coded paths in comments. Using the disable linemarker option(-P) will skip the generation of comments
Change-Id: Ifb134052996c343f5405e954784b4b2c286c36b1
use AMD_COMGR_ACTION_COMPILE_SOURCE_TO_RELOCATABLE action
to compile source to realoc. Currently we have source->bc,
link->bc and bc->realoc. This new action replaces the
three steps with one.
Change-Id: I6ba551b8d04c7e06f41c4324026e4dcd2db1970f