- rocr attribute needs to be updated after each iteration
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Change-Id: I3afb2d7954ef3de37f5f5f9d3cc7757fdacffcec
[ROCm/clr commit: 50e0ddb055]
Maintain status of handler callback. For event records we no longer
submit callbacks to reduce the load on the async handler thread. However
without a callback we leak command memory/decrement refcounts. Indicate
status of the handler which we can use to queue a callback when
finish is called.
Change-Id: I89fd02f3d047a0e8162664ee17581a14795f1928
[ROCm/clr commit: 5df34a2f7a]
Move hidden heap creation to the kernel launch to make sure it's
allocated on the actual first usage.
Change-Id: I1b65a82fc06d9129ed45a69765bf14ea3d945b04
[ROCm/clr commit: 4975f69337]
The heap must be cleared once per device, but ROCclr doesn't
create a queue per device in HIP. Hence, the clear operation will
be performed during the first queue creation.
Change-Id: I52ceb06d67d11cde6d019c5ab510059f426a9bfb
[ROCm/clr commit: 04bfd93569]
Implement map/unmap for PAL backend
Create commands since PAL uses the IQueue to map/unmap
Change-Id: I97e26a7d28ae5e10774c9ca65307153100945621
[ROCm/clr commit: 67657d6099]
- check pcie atomci support for printf functionality
- if not enabled printf wont work
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Change-Id: Ib366e8e71772b02210c4a830bca4bd8cc7a11664
[ROCm/clr commit: 15f1632dfa]
Adding virtual memory management APIs to rocclr.
The HIP layer will handle virtual allocs on devices.
Change-Id: Ia978f105c2c3fed3959c77580ba228e845105754
[ROCm/clr commit: b5f555f9ec]
- Add a global cache state for a device to indicate scopes of submitted
AQL packets
- Remove scopes for TS marker if hipEventReleaseToDevice is passed. Set
env ROC_EVENT_NO_FLUSH=1 to use NOP AQL for event records.
It would flush caches by default with system scope release.
- Calling finish() should ensure if caches are flushed, if not queue a
marker
Change-Id: Ibbbdbb1cd7ac61cb35649169212142545be159e0
[ROCm/clr commit: 8eeaa998c0]
Add ROCR memory detection and enable arena mem object for possible
access in HIP
Change-Id: Icf86ac789176bfee4ea8d36b0970a817d4c6a2f7
[ROCm/clr commit: 28597ec5b5]
- Queue handler for hipEventRecord(aka marker_ts_) only if there is a
callback associated with it.
Change-Id: I8a9877ae0e342556053abbaacc9510744a8e772a
[ROCm/clr commit: 3c3c0ca4c5]
It can be too early to allocate memory at the begining of
Device::create() under PAL
Change-Id: I4bd76db7be3f6fb246243ea68022d8b0f860471d
[ROCm/clr commit: 3af3fe10de]
Use HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT to get compute
units. This is needed to work around assymentric CU harvesting bug on
gfx90a. Add a new device property to get the max available CUs on the
device.
Change-Id: I878f38f14f16c1af01fc0a77157aea1e816a63b8
[ROCm/clr commit: 33aca5a4a6]
Set affinity to the closest node of the current GPU. This reduces
the latency to fetch kernel args since device would query the CPU cache
of core which did the dispatch. This behavior is controlled with
AMD_CPU_AFFINITY env var(disabled by default)
Change-Id: I65afba62cb818ea25a311b88d1c0dd5c51330292
[ROCm/clr commit: b192beea52]
Just signal check will still submit the marker and then later
runtime will have a timeout, but the barrier packet is still
generated. Hence early timeout will allow to skip the marker.
Change-Id: Ieb7d89becbcff43a4f4c46715354ca65ab4a80b9
[ROCm/clr commit: bbb635bc32]
The original logic was left after initial testing when HMM
couldn't handle xnack properly
Change-Id: I0abf01805704171e931dfba8b6d95bfe87d5fab1
[ROCm/clr commit: d17108e8d0]
info_.extensions_ and settings_ are deleted at amd::Device()::~Device().
Change-Id: I06f240a42e5c131dbd4e61a759f905bcdf84b45a
[ROCm/clr commit: f212fc91ca]
The queue can be destroyed at the time the app will request
the event status. Hence just get the active state from the device.
Change-Id: I887ecb0cfe414c2119247228b0d1255b8308da1e
[ROCm/clr commit: f116959b54]
When unsetting runtime should use HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE
for the agent and not HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE_IN_PLACE
Change-Id: I3814802d1fb3b72c54e7566defafafed6b0d5cee
[ROCm/clr commit: d8a86e4870]
Add a env var ROC_USE_FGS_KERNARG to toggle kernel arg placement
By default its in Fine Grain Kernel arg segment for supported asics.
Change-Id: I3d57ed69a1a4db2b392b0438ead499f3ddca4716
[ROCm/clr commit: e29b9c00ee]
Fixes Seg fault caused when the attribute hipMemRangeAttributeAccessedBy
is queried using hipMemRangeGetAttribute
Change-Id: I2ceb2267d89bfc31a55d9eae2685610c7ad89b1f
[ROCm/clr commit: 48c1b895c0]
The new query MemRangeAttribute::CoherencyMode can return current
coherency mode for the provided memory region. Coherency mode can
be one of the following types: FineGrain, CoarseGrain and
Indeterminate
Change-Id: Ib66feeeb14f57a8b1cc731c65bb3d0276d297ff7
[ROCm/clr commit: 992830bab7]