This PR adds UberTrace-based tracing support to ROCclr's PAL device class.
Legacy RGP-based tracing is still available and is the default.
If UberTrace support is enabled tool-side, this new code path will activate.
Change-Id: I268b2dcef70e850a50e2caef8355f38bf51d4641
[ROCm/clr commit: e550032d25]
This shows up in some valgrind runs. Make sure the resources are
released.
Change-Id: I34c25c00370a221585895655744831215136d5f4
[ROCm/clr commit: 4b03017e8a]
The new set tracks only the queues that have a command
submitted to them. This allows for fast iteration
in waitActiveStreams.
Change-Id: I2c832eefa01280d9a87a5f57874d36d2e9441de7
[ROCm/clr commit: bcc545e6b8]
The variable is already set as cache, so that user can override.
But the hard coded setting is preventing override. Removed the same
Change-Id: I2aecc18ce4f1d1b523ba267ef1c8ef4ea1168d9c
[ROCm/clr commit: 4d0b815d06]
1) currently cpu wait is set to true, which makes the host wait for last
command in queue to finish even if the kernel execution has already
finished causing delay in device sync call.
2) device sync only needs to await completion when hw event
is not ready.
Change-Id: I91e3e89d39a1193ae06abac822cea8ae651493a5
[ROCm/clr commit: eb1089593e]
Use env var DEBUG_CLR_KERNARG_HDP_FLUSH_WA=1 to fall back to HDP flush
workaround. The default is 0
Change-Id: I7bdb9be61da60c30d15ac9991b7cd27351e1831c
[ROCm/clr commit: 9de6d4d46c]
Ensure the member function Alloc() and Free() of command_pool_ will not be
accessed after command_pool_ be destructed.
Signed-off-by: Chong Li <chongli2@amd.com>
Change-Id: Ic2d36423302518a030bd61fa399290ebe2ed8194
[ROCm/clr commit: e6a5c81221]
1) Since g_devices is not initialized when stream_per_thread constructor
is called on windows, m_streams is empty when hipDeviceReset is called.
2) clear_spt tries to access empty vector causing segfaults in
hipDeviceReset call.
3) on linux ROCCLR_INIT_PRIORITY makes sure that g_devices is initialized
first before tls constructor creates stream_per_thread object.
Change-Id: Ib2ba643d1278d820287ea3b242ed0878d7529165
[ROCm/clr commit: 450eca293b]
The amdgpu-arch tool is not supported for static build.
This commit adds changes to detect the build type during
cmake config and use the rocm_agent_enumerator for static build.
Change-Id: I8a295e01f54075507390ef540f16b28bb20237a9
[ROCm/clr commit: a02888af58]
This change adds a new HIP API `hipExtHostAlloc` which preserves
the functionality of `hipHostMalloc`.
Change-Id: I13504c6fc13465ddd7aed329795bb4f2fef1baff
[ROCm/clr commit: 2c84211b58]
Integration into pytorch pointed out some issues, value narrowing, to
fix this we are now using unions. Also removed check for -munsafe*
compiler flag. The check is now just on builtin detection.
Change-Id: I49364503fa429bd862952f9b29879072afa6d553
[ROCm/clr commit: bb52d9ed62]
hiprtc and hip APIs use the same file.
Append to file instead of start of file
Change-Id: I2703f9bb67f0c51b557a058daab129679a0b5dd9
[ROCm/clr commit: e07172ff57]
This patch fixes this potential issue that filling AQL header before
filling the AQL body. The hsa spec specifies "Packet processors may
process AQL packets after the packet format field is updated, but
before the doorbell is signaled."
However, the hipGraph AQL package with valid header will be filled
before fill the body, which may have the potential issue that CP
receive invalid AQL body.
Change-Id: I84af798c19ee2b8805ba19732b0eabdea2958a96
[ROCm/clr commit: 3959b5be1e]
SPT is destroyed with hipDeviceReset(). If a
stream is created right after reset, the same
object id could be reused. Later SPT destructor
incorrectly verifies that the stream is valid
referring to the reused object id causing the
corruption.
Change-Id: I3b1f7ffdf8bab874dca7b8fde22318162997b8f6
[ROCm/clr commit: f6a68b3c2e]
This change adds fixes in optimized multistream path for childGraph uses cases.
1) For childgraph nodes, rely on runNodes() only to process
the childgraph and skip calls to createCommand and enqueueCommands.
This ensures that the start/end markers are enqueued correctly
with respect to the childGraph commands.
In addition, the runNodes() for the childgraph should be called after
the dependency walkthrough to make sure that the subgraph is executed once.
2) Nodes with no outgoing edges should be marked
as a leafs regardless of which stream they are assigned to.
This is to ensure that marker dependencies from nodes
that run on non-zero stream to subgraph leafs that run on zero stream
are still set up correctly.
Change-Id: I4a5f4f3b0e0d01e515cdcb045b46c2798f291255
[ROCm/clr commit: 464b99373b]
Fix random language string that leads to compiling failure
of trap handler and TDR of hipMemset() on VM in release
mode of hip-rt
Change-Id: Ie1d874742b804f62ceda68064fa54f5d39c092b8
[ROCm/clr commit: 857d0d60b9]