Use GNUInstallDirs variables to determine the location of BINDIR,
LIBDIR, INCLUDEDIR, DOCDIR, LIBEXECDIR and SYSCONFDIR.
Note that CMAKE_INSTALL_LIBDIR is overriden, since the default for RHEL
is lib64, but ROCm packaging wants it to be lib always. Distros or users
can easily override this.
Project name changed from rocprofiler64 to rocprofiler,since CMAKE_INSTALL_DOCDIR uses the project name
Change-Id: Iff2622b4bfc38ce5caea270e6e44ba74485cb9e4
In a future change, the tracer API library (libroctracer64.so) will be
automatically registered as a tool library. Until then, explicitly
register it by adding it to the HSA_TOOLS_LIB environment variable.
Change-Id: I44d78ac38608e6da5edf04b498a73485f5609d06
Fixing the RPATH skip & Removed the export line from the build.sh as we have find_library with giving it a path to /opt/rocm & easy to use build.sh
Change-Id: I1ac5b51eafb54ef0359bf6fb55f2fe2d39a6cafa
prof_protocol.h is now located in /opt/rocm/include/roctracer/ext instead /opt/rocm/roctracer/include/ext
Change-Id: I98623dcf3c2e6bcef128c1ef35959ef0a4a1d63f
HIP/HSA traces were asked to access range_data list, however, it was not initialized because roctx tracing was not enabled, moved lists initialization before roctx check
Change-Id: I9942876445cb1b2f69c6bb0d8986d6d9234f1441
To enable this feature use the --roctx-rename rocprof option. This
implementation records all messages received in roctxPush calls and
use them to replace corresponding kernel names.
Tested with the following HIP program:
\#include <hip/hip_runtime.h>
\#include <roctracer/roctx.h>
__global__ void
ThisIsALongKernelName ()
{
}
int
main (int argc, char* argv[])
{
hipSetDevice (0);
// Not in a roctx range.
ThisIsALongKernelName<<<1, 1>>> ();
roctxRangePush ("A");
// In a simple first level roctx range.
ThisIsALongKernelName<<<1, 1>>> ();
roctxRangePop ();
roctxRangePush ("B");
roctxRangePush ("C");
// In a nested roctx range.
ThisIsALongKernelName<<<1, 1>>> ();
roctxRangePop ();
roctxRangePop ();
roctxRangePush ("D");
roctxRangePush ("E");
roctxRangePop ();
// In a first level roctx range, but after a nested range.
ThisIsALongKernelName<<<1, 1>>> ();
roctxRangePop ();
hipDeviceSynchronize ();
return 0;
}
Change-Id: I629312234468daff8b017caa5cb0773707d98cce
In a previous change the key for the var_table in tblextr.py script has been changed from one value to a tuple without changing the usage of the var_table in the rest of the script
Change-Id: I38964f61afad5323d1ca9b64d538cec426298842
The Post-Processing script was depending HSA API call for async mem copies to correlate it with the HSA Async Memcpy Activity, now if user decided to include input file with filtering HSA Api calls without adding HSA Memcpy, then all the correlation data will be dropped and the Async activity will be reported with the information given from the HSA async activity result file
Change-Id: I5123a5acab9b35a4c25793e7953fdfb74929c999
Include the upgrade operation check in the prerm script
in package.
Signed-off-by: Saravanan Solaiyappan <saravanan.solaiyappan@amd.com>
Change-Id: Ia2bf70bc3c8ce4ddb099ac58f32e165a0fe58824
'merge_traces script from rocprof fails to include GPU / HSA / ROCTX activity in merged trace' change was missing tuple addition to the second for loop causing issues on gfx908 and gfx906 | change NO: 628475
Change-Id: Ic0b6140d4372eb109fdf7bdc8d58c0d84239196d
If we are packaging debug information then we need to edit the created
libraries and executables to extract the debug information. Due to a
bug in the tooling this requires write access to the created files.
Allow generation of only rpm and only deb files is specified on the
command line.
Signed-off-by: Icarus Sparry <icarus.sparry@amd.com>
Change-Id: I9a9df81102770ba681b1e7e0b5f704990f5435bb
Remove hard code of generating both deb and rpm, allow the user to
specify what is desired and cache that choice.
Create executable with owner write permission to work around binutils
bug.
Change-Id: I67655e5d351b227d1a8db4645228300d2bb83f9a
Signed-off-by: Icarus Sparry <icarus.sparry@amd.com>
Fixed exception thrown when ROCP_HSA_INTERCEPT not set or set to 0;
Fixed ROCM hsa_init() failed with error 4096 when trying to read hardware performance counters;
Fixed LD_LIBRARY_PATH to include necessary library;
Change-Id: Idcb7ff807a79f4267374c34041d3bca33d85f532
Changed derived metrics to double from int64.
Fixed standalone test due to int64 to float change
Fixed intercept test due to int64 to float change.
Change-Id: I49631c187406ae9dd94a869b3bb13772012e8cdf
Instead of detecting files (header/library), use cmake's find_package to
locate the required dependencies (hsa-runtime64 and hsakmt).
Adding hsa-runtime64::hsa-runtime64 and hsakmt::hsakmt to the
target_link_libraries also takes care of adding the interfaces include
directories to the search path.
Change-Id: I64eb77c97dac7982ac96d3158ad57df776cc0b53
L2 flush is triggered by explicit cache flush PM4 packet in aqlprofile
packets to GPU. This cache flush is used to sync up CPU and GPU to make
sure perfomance counters copied to profile output buffer is visible to
CPU. To get rid of this cache flush the followings are done:
1. This explicit cache flush packet is removed from aqlprofile code
(another commit to aqlprofile code).
2. This commit which changed profile output buffer to use kernarg
memory since it is uncached for GPU.
After these changes profile counter values when copied by GPU to output
buffer they are guaranteed to be visible to CPU.
Change-Id: Ie953949c85fbee2f4369f1de966bcfb33daec084
Removed the old code for trying to locate libhsakmt.so.1 as it is replaced by libhsakmt.a static library
Change-Id: Icc5a0f6ead285e2406e6e83614e536184e3a2663
Add roctx_trace to the list of files that need to be merged when
aggregating results from multiple runs.
Change-Id: I5810be9e9220765ed8e8a84eca854131e97e61b1
Added Support for launch kernel functions to fill_api_db
Added support for hipMemcpyToSymbol in add_memcpy
Added support for hsa_amd_memory_pool_allocate to be counted as source of allocations
Change-Id: I68806106324b19ca6f09d413df37c27582be2f51