If we are packaging debug information then we need to edit the created
libraries and executables to extract the debug information. Due to a
bug in the tooling this requires write access to the created files.
Allow generation of only rpm and only deb files is specified on the
command line.
Signed-off-by: Icarus Sparry <icarus.sparry@amd.com>
Change-Id: I9a9df81102770ba681b1e7e0b5f704990f5435bb
Remove hard code of generating both deb and rpm, allow the user to
specify what is desired and cache that choice.
Create executable with owner write permission to work around binutils
bug.
Change-Id: I67655e5d351b227d1a8db4645228300d2bb83f9a
Signed-off-by: Icarus Sparry <icarus.sparry@amd.com>
Fixed exception thrown when ROCP_HSA_INTERCEPT not set or set to 0;
Fixed ROCM hsa_init() failed with error 4096 when trying to read hardware performance counters;
Fixed LD_LIBRARY_PATH to include necessary library;
Change-Id: Idcb7ff807a79f4267374c34041d3bca33d85f532
Changed derived metrics to double from int64.
Fixed standalone test due to int64 to float change
Fixed intercept test due to int64 to float change.
Change-Id: I49631c187406ae9dd94a869b3bb13772012e8cdf
Instead of detecting files (header/library), use cmake's find_package to
locate the required dependencies (hsa-runtime64 and hsakmt).
Adding hsa-runtime64::hsa-runtime64 and hsakmt::hsakmt to the
target_link_libraries also takes care of adding the interfaces include
directories to the search path.
Change-Id: I64eb77c97dac7982ac96d3158ad57df776cc0b53
L2 flush is triggered by explicit cache flush PM4 packet in aqlprofile
packets to GPU. This cache flush is used to sync up CPU and GPU to make
sure perfomance counters copied to profile output buffer is visible to
CPU. To get rid of this cache flush the followings are done:
1. This explicit cache flush packet is removed from aqlprofile code
(another commit to aqlprofile code).
2. This commit which changed profile output buffer to use kernarg
memory since it is uncached for GPU.
After these changes profile counter values when copied by GPU to output
buffer they are guaranteed to be visible to CPU.
Change-Id: Ie953949c85fbee2f4369f1de966bcfb33daec084
Removed the old code for trying to locate libhsakmt.so.1 as it is replaced by libhsakmt.a static library
Change-Id: Icc5a0f6ead285e2406e6e83614e536184e3a2663
Add roctx_trace to the list of files that need to be merged when
aggregating results from multiple runs.
Change-Id: I5810be9e9220765ed8e8a84eca854131e97e61b1
Added Support for launch kernel functions to fill_api_db
Added support for hipMemcpyToSymbol in add_memcpy
Added support for hsa_amd_memory_pool_allocate to be counted as source of allocations
Change-Id: I68806106324b19ca6f09d413df37c27582be2f51
Added Support for launch kernel functions to fill_api_db
Added support for hipMemcpyToSymbol in add_memcpy
Added support for hsa_amd_memory_pool_allocate to be counted as source of allocations
Change-Id: I456a242ca1bc0c1bd39ae687a455b02c588de466
When building the json data flow, from_us_list has (timestamp, stream_id, thread_id).
stream_id used to be interpreted as from_tid and tid as to_tid. But that's not correct.
stream_id is always a destination and tid is the initiator (source).
Change-Id: I2f5bb86a387b4003b17271c90bdf9de4b59a79bf
Marker events inside hcc_ops_trace.txt are from barriers so they are not meant to be stored in ops_patch_data map.
Added support for hipMemset events which are a kind of memory copy.
Change-Id: I213fe959bcd35ff0371613ba5bffd95bc53e06b5
recordid cannot be just a counter. The code removed was doing
just that i.e. incrementing a counter. Recordid has to come
from recvals data structure. That code was left there since
a while when Evgeny and Rachida were trying to prototype this feature.
I am not sure why it was not spotted before.
Change-Id: Ia867066dcfca083fcd4111f2aefc2fec88c26314
1st issue was that one of the ostream ops failed to print the
content of the struct.
2nd issue: get_ptr_type was called with args being src/dest
pointers while it should be the agents pointers for src/dest.
3rd issue: memcopies map used (recordid, procid, is_async)
as a key but this is not enough as some copies share same key,
so I added begin/end timestamps as a way to distinguish between them.
Change-Id: I7c6e80e74e30ea572f21612aaf0cf7efec6e91e6
Add numa lib as this will be required with a static thunk
Look for static thunk of shared thunk cannot be found
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Idcaa0c785a0502c9f5fe42e2dfb9e0c1780f9d66
On Ubuntu 20.04, in Release mode, gcc fails with this error:
In file included from /usr/include/string.h:495,
from /opt/rocm/include/hsa/hsa_api_trace.h:57,
from ../rocprofiler/src/util/hsa_rsrc_factory.h:29,
from ../rocprofiler/src/util/hsa_rsrc_factory.cpp:25:
In function ‘char* strncpy(char*, const char*, size_t)’,
inlined from ‘const util::AgentInfo* util::HsaRsrcFactory::AddAgentInfo(hsa_agent_t)’ at ../rocprofiler/src/util/hsa_rsrc_factory.cpp:323:12:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:106:34: error: ‘char* __builtin___strncpy_chk(char*, const char*, long unsigned int, long unsigned int)’ specified bound depends on the length of the source argument [-Werror=stringop-overflow=]
106 | return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
| ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../rocprofiler/src/util/hsa_rsrc_factory.cpp: In member function ‘const util::AgentInfo* util::HsaRsrcFactory::AddAgentInfo(hsa_agent_t)’:
../rocprofiler/src/util/hsa_rsrc_factory.cpp:322:39: note: length computed here
322 | const int gfxip_label_len = strlen(agent_info->name) - 2;
| ~~~~~~^~~~~~~~~~~~~~~~~~
The error is caused by the following 2 lines:
const int gfxip_label_len = strlen(agent_info->name) - 2;
strncpy(agent_info->gfxip, agent_info->name, gfxip_label_len);
The size argument to strncpy should not depend on the input string.
Since the terminating character is not considered (the copy is at
most len - 2 bytes), using memcpy is preferable. Also, make sure
the destination does not overflow by clamping the size.
Change-Id: I0c5cf7e0daf4cd6fcf7092efb1d9fd4c02a6c639