Use hsa header files from /opt/rocm-ver/include rather than using wrapper files from /opt/rocm-ver/hsa/include/hsa
Change-Id: Id7a9bde19447cd2a0fd6e03b11c08471f09c2a46
[ROCm/rocprofiler commit: e7eb195924]
Enabling the new methodology of ROCP_STATS_OPT of getting HIP activities while the application is running
Change-Id: I19d09e2f2abe16e6528240bf55d6048645aaa09f
[ROCm/rocprofiler commit: 6f97e15d55]
This issue happens when we have slices that have overlapping
timestamp values (i.e. start_ts + dur = start_ts of next slice)
and the flow event references that shared timestamp value.
The google chrome parser sorts all the events
and just taking the most recent one.
Rocprof is outputting flow attachment points as start+dur;
the end ts of the slice that can overlap with the next slice
if no gap between them.
Solution proposal by Michael Steffen (Michael.Steffen@amd.com)
Change-Id: Ifcde142144033c9012b01a78c95f047384f972dd
[ROCm/rocprofiler commit: 93a79eb99a]
Adding commits found in mainline and not in staging: Add dependency on rocm-core
Signed-off-by: Icarus Sparry <icarus.sparry@amd.com>
Change-Id: Icb935e9230888fd005d9ca3617e28f6173173cc8
[ROCm/rocprofiler commit: 22f2e593c2]
Using libroctracer_tool.so instead of libtracer_tool.so and fixing the paths for the tracer tool
Change-Id: I9b3ca885f3ca5385b106d5376894b1b4054f9c1d
[ROCm/rocprofiler commit: 0be6306cae]
Use GNUInstallDirs variables to determine the location of BINDIR,
LIBDIR, INCLUDEDIR, DOCDIR, LIBEXECDIR and SYSCONFDIR.
Note that CMAKE_INSTALL_LIBDIR is overriden, since the default for RHEL
is lib64, but ROCm packaging wants it to be lib always. Distros or users
can easily override this.
Project name changed from rocprofiler64 to rocprofiler,since CMAKE_INSTALL_DOCDIR uses the project name
Change-Id: Iff2622b4bfc38ce5caea270e6e44ba74485cb9e4
[ROCm/rocprofiler commit: b24e05e138]
Added approved HW counters for MI200. Also added derived metrics for the same
Change-Id: I1c6abfdfde4e4fd4ba8bd5eec0557ad08fd71c77
[ROCm/rocprofiler commit: 6d233c65d7]
In a future change, the tracer API library (libroctracer64.so) will be
automatically registered as a tool library. Until then, explicitly
register it by adding it to the HSA_TOOLS_LIB environment variable.
Change-Id: I44d78ac38608e6da5edf04b498a73485f5609d06
[ROCm/rocprofiler commit: 0123aa61fa]
Fixing the RPATH skip & Removed the export line from the build.sh as we have find_library with giving it a path to /opt/rocm & easy to use build.sh
Change-Id: I1ac5b51eafb54ef0359bf6fb55f2fe2d39a6cafa
[ROCm/rocprofiler commit: 0faaa83de7]
prof_protocol.h is now located in /opt/rocm/include/roctracer/ext instead /opt/rocm/roctracer/include/ext
Change-Id: I98623dcf3c2e6bcef128c1ef35959ef0a4a1d63f
[ROCm/rocprofiler commit: 1f5b02f9c8]
HIP/HSA traces were asked to access range_data list, however, it was not initialized because roctx tracing was not enabled, moved lists initialization before roctx check
Change-Id: I9942876445cb1b2f69c6bb0d8986d6d9234f1441
[ROCm/rocprofiler commit: 1f925b3f16]
To enable this feature use the --roctx-rename rocprof option. This
implementation records all messages received in roctxPush calls and
use them to replace corresponding kernel names.
Tested with the following HIP program:
\#include <hip/hip_runtime.h>
\#include <roctracer/roctx.h>
__global__ void
ThisIsALongKernelName ()
{
}
int
main (int argc, char* argv[])
{
hipSetDevice (0);
// Not in a roctx range.
ThisIsALongKernelName<<<1, 1>>> ();
roctxRangePush ("A");
// In a simple first level roctx range.
ThisIsALongKernelName<<<1, 1>>> ();
roctxRangePop ();
roctxRangePush ("B");
roctxRangePush ("C");
// In a nested roctx range.
ThisIsALongKernelName<<<1, 1>>> ();
roctxRangePop ();
roctxRangePop ();
roctxRangePush ("D");
roctxRangePush ("E");
roctxRangePop ();
// In a first level roctx range, but after a nested range.
ThisIsALongKernelName<<<1, 1>>> ();
roctxRangePop ();
hipDeviceSynchronize ();
return 0;
}
Change-Id: I629312234468daff8b017caa5cb0773707d98cce
[ROCm/rocprofiler commit: 1078a088e9]
In a previous change the key for the var_table in tblextr.py script has been changed from one value to a tuple without changing the usage of the var_table in the rest of the script
Change-Id: I38964f61afad5323d1ca9b64d538cec426298842
[ROCm/rocprofiler commit: 46c4e5045a]
The Post-Processing script was depending HSA API call for async mem copies to correlate it with the HSA Async Memcpy Activity, now if user decided to include input file with filtering HSA Api calls without adding HSA Memcpy, then all the correlation data will be dropped and the Async activity will be reported with the information given from the HSA async activity result file
Change-Id: I5123a5acab9b35a4c25793e7953fdfb74929c999
[ROCm/rocprofiler commit: fd4767d954]
Include the upgrade operation check in the prerm script
in package.
Signed-off-by: Saravanan Solaiyappan <saravanan.solaiyappan@amd.com>
Change-Id: Ia2bf70bc3c8ce4ddb099ac58f32e165a0fe58824
[ROCm/rocprofiler commit: 39ca27d923]
'merge_traces script from rocprof fails to include GPU / HSA / ROCTX activity in merged trace' change was missing tuple addition to the second for loop causing issues on gfx908 and gfx906 | change NO: 628475
Change-Id: Ic0b6140d4372eb109fdf7bdc8d58c0d84239196d
[ROCm/rocprofiler commit: 7a9692766d]
Making the new License file, Adding support in the CMakeLists.txt
Change-Id: I785035a780fbfc59951fc27d45f9c1869ffb4fb3
[ROCm/rocprofiler commit: c19cfbfffd]
If we are packaging debug information then we need to edit the created
libraries and executables to extract the debug information. Due to a
bug in the tooling this requires write access to the created files.
Allow generation of only rpm and only deb files is specified on the
command line.
Signed-off-by: Icarus Sparry <icarus.sparry@amd.com>
Change-Id: I9a9df81102770ba681b1e7e0b5f704990f5435bb
[ROCm/rocprofiler commit: 6676d60516]
Remove hard code of generating both deb and rpm, allow the user to
specify what is desired and cache that choice.
Create executable with owner write permission to work around binutils
bug.
Change-Id: I67655e5d351b227d1a8db4645228300d2bb83f9a
Signed-off-by: Icarus Sparry <icarus.sparry@amd.com>
[ROCm/rocprofiler commit: f4c7309592]
Fixed exception thrown when ROCP_HSA_INTERCEPT not set or set to 0;
Fixed ROCM hsa_init() failed with error 4096 when trying to read hardware performance counters;
Fixed LD_LIBRARY_PATH to include necessary library;
Change-Id: Idcb7ff807a79f4267374c34041d3bca33d85f532
[ROCm/rocprofiler commit: a8b5d6cf33]
Changed var_pattern in tblextr.py to include pattern like "name[0]"
Change-Id: Ibe1c512595cfbdcaca8fa5bddceb3f6a570caf43
[ROCm/rocprofiler commit: ff43ca1542]
Changed derived metrics to double from int64.
Fixed standalone test due to int64 to float change
Fixed intercept test due to int64 to float change.
Change-Id: I49631c187406ae9dd94a869b3bb13772012e8cdf
[ROCm/rocprofiler commit: f9017cbdc5]
Instead of detecting files (header/library), use cmake's find_package to
locate the required dependencies (hsa-runtime64 and hsakmt).
Adding hsa-runtime64::hsa-runtime64 and hsakmt::hsakmt to the
target_link_libraries also takes care of adding the interfaces include
directories to the search path.
Change-Id: I64eb77c97dac7982ac96d3158ad57df776cc0b53
[ROCm/rocprofiler commit: acb246f788]
L2 flush is triggered by explicit cache flush PM4 packet in aqlprofile
packets to GPU. This cache flush is used to sync up CPU and GPU to make
sure perfomance counters copied to profile output buffer is visible to
CPU. To get rid of this cache flush the followings are done:
1. This explicit cache flush packet is removed from aqlprofile code
(another commit to aqlprofile code).
2. This commit which changed profile output buffer to use kernarg
memory since it is uncached for GPU.
After these changes profile counter values when copied by GPU to output
buffer they are guaranteed to be visible to CPU.
Change-Id: Ie953949c85fbee2f4369f1de966bcfb33daec084
[ROCm/rocprofiler commit: 2b79931631]