Remove alignment to granularity for IPC handles as ROCr has a patch that
will internally validate pointer sizes against requested size during
allocation instead of size aligned to page size. This patch is needed
together with this patch from ROCr:
f8a42a3a:Use user requested size for memory fragments
Change-Id: I28b25558ea03c836b44fafdb34b7330cf6887424
Heap initialization used device queue, but it shoudl be used for
cooperative launches only. Heap initialization must use the same queue
as the current dispatch.
Change-Id: I856621bf82bbdeb1c2d0fbc4970e90d09af805cb
This reverts commit 4afca0647e.
Reason for revert: ROCr query should now be usable in upcoming release.
Change-Id: I2207761ca6af5d585d090bae1af09eb9a8e9bad6
The ROCclr assigns zero-based IDs to GPUs in the order they are
discovered. That zero-based ID is what is used to identify the GPU
on which the HIP_OPS activity took place.
When multiple ranks are used, each rank's first logical device always
has GPU ID 0, regardless of which physical device is selected with
CUDA_VISIBLE_DEVICES. Because of this, when merging trace files from
multiple ranks, GPU IDs from different processes may overlap.
The long term solution is to use the KFD's gpu_id which is stable
across APIs and processes. Unfortunately the gpu_id is not yet exposed
by the ROCr, so for now use the driver's node id.
Change-Id: Ib78854527d600d175bb76e2df0747c33f898c615
- Store last fence scopes and use the last value to determine if we need a cache flush again. This helps cases where hipExtLaunchKernel API is
used.
- Purge code for ROC_EVENT_NO_FLUSH
Change-Id: I531cf9c9c60d5e2b3a9e265d0f52f79ed2fa8a8c
- rocr attribute needs to be updated after each iteration
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Change-Id: I3afb2d7954ef3de37f5f5f9d3cc7757fdacffcec
Maintain status of handler callback. For event records we no longer
submit callbacks to reduce the load on the async handler thread. However
without a callback we leak command memory/decrement refcounts. Indicate
status of the handler which we can use to queue a callback when
finish is called.
Change-Id: I89fd02f3d047a0e8162664ee17581a14795f1928
Move hidden heap creation to the kernel launch to make sure it's
allocated on the actual first usage.
Change-Id: I1b65a82fc06d9129ed45a69765bf14ea3d945b04
The heap must be cleared once per device, but ROCclr doesn't
create a queue per device in HIP. Hence, the clear operation will
be performed during the first queue creation.
Change-Id: I52ceb06d67d11cde6d019c5ab510059f426a9bfb
- check pcie atomci support for printf functionality
- if not enabled printf wont work
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Change-Id: Ib366e8e71772b02210c4a830bca4bd8cc7a11664
Adding virtual memory management APIs to rocclr.
The HIP layer will handle virtual allocs on devices.
Change-Id: Ia978f105c2c3fed3959c77580ba228e845105754
- Add a global cache state for a device to indicate scopes of submitted
AQL packets
- Remove scopes for TS marker if hipEventReleaseToDevice is passed. Set
env ROC_EVENT_NO_FLUSH=1 to use NOP AQL for event records.
It would flush caches by default with system scope release.
- Calling finish() should ensure if caches are flushed, if not queue a
marker
Change-Id: Ibbbdbb1cd7ac61cb35649169212142545be159e0
- Queue handler for hipEventRecord(aka marker_ts_) only if there is a
callback associated with it.
Change-Id: I8a9877ae0e342556053abbaacc9510744a8e772a