* In a scenario where kernel is launched with hipExtLaunchKernelGGL and stop event is used, hipGraphInstantiate leaks. Since stop event is used, profiling is enabled and Timestamp (ReferencedCountedObject) is created, but it doesn't get released.
* The idea behind this solution is that profiling should be disabled when command is captured, hence the timestamp should not be created. Because information about capturing isn't available when kernel command is created, packet capturing state is used to determine whether to create a timestamp or not.
Change-Id: Ia23adac4592ded4fb5e236acf99e12e729f63692
[ROCm/clr commit: da5f1a6146]
Gfx12 has 16 bits for grid dim Y/Z. Detect gfxIp and return error if dim y/z > 16 bits
Change-Id: I43dd14affc9e4073d0b1232e7523967f0180fa31
[ROCm/clr commit: 0a918c8f96]
Although unpinned copies require synchronizations
in HIP, runtime can avoid syncs for H2D copies with
a staging buffer
Change-Id: If2203c6bc0cbd89742823688dc8e89e9acd873b2
[ROCm/clr commit: 29cc678d8d]
This reverts commit 2e7581a69a.
Changing the error code is considered as a breaking change,
so it should be done in major releases only.
The other reason for reverting the commit is that this change itself
is incorrect. Cuda behaves in the same way as hip when
pResDesc or pTexDesc are nullptr.
Change-Id: I3abee6b79279b81ab01c7f8466c7f8e3776c4109
[ROCm/clr commit: cfdc9dfc36]
1) Child Graph nodes need to have parent graph dependencies in waitlist.
2) Marker is placed on base stream with parent graph waitlist
Change-Id: Iec65a0171ea387be05b0733abcc708fb630e4be4
[ROCm/clr commit: 4d1ded9eaf]
This PR adds UberTrace-based tracing support to ROCclr's PAL device class.
Legacy RGP-based tracing is still available and is the default.
If UberTrace support is enabled tool-side, this new code path will activate.
Change-Id: I268b2dcef70e850a50e2caef8355f38bf51d4641
[ROCm/clr commit: e550032d25]
This shows up in some valgrind runs. Make sure the resources are
released.
Change-Id: I34c25c00370a221585895655744831215136d5f4
[ROCm/clr commit: 4b03017e8a]
The new set tracks only the queues that have a command
submitted to them. This allows for fast iteration
in waitActiveStreams.
Change-Id: I2c832eefa01280d9a87a5f57874d36d2e9441de7
[ROCm/clr commit: bcc545e6b8]
The variable is already set as cache, so that user can override.
But the hard coded setting is preventing override. Removed the same
Change-Id: I2aecc18ce4f1d1b523ba267ef1c8ef4ea1168d9c
[ROCm/clr commit: 4d0b815d06]
1) currently cpu wait is set to true, which makes the host wait for last
command in queue to finish even if the kernel execution has already
finished causing delay in device sync call.
2) device sync only needs to await completion when hw event
is not ready.
Change-Id: I91e3e89d39a1193ae06abac822cea8ae651493a5
[ROCm/clr commit: eb1089593e]
Use env var DEBUG_CLR_KERNARG_HDP_FLUSH_WA=1 to fall back to HDP flush
workaround. The default is 0
Change-Id: I7bdb9be61da60c30d15ac9991b7cd27351e1831c
[ROCm/clr commit: 9de6d4d46c]
Ensure the member function Alloc() and Free() of command_pool_ will not be
accessed after command_pool_ be destructed.
Signed-off-by: Chong Li <chongli2@amd.com>
Change-Id: Ic2d36423302518a030bd61fa399290ebe2ed8194
[ROCm/clr commit: e6a5c81221]
1) Since g_devices is not initialized when stream_per_thread constructor
is called on windows, m_streams is empty when hipDeviceReset is called.
2) clear_spt tries to access empty vector causing segfaults in
hipDeviceReset call.
3) on linux ROCCLR_INIT_PRIORITY makes sure that g_devices is initialized
first before tls constructor creates stream_per_thread object.
Change-Id: Ib2ba643d1278d820287ea3b242ed0878d7529165
[ROCm/clr commit: 450eca293b]