Before, the GPU IDs were counted starting from zero, now CPU IDs are counted from zero and then GPU IDs from the last CPU_ID+1
Change-Id: I3f815195ad97933e02f249841e53b64b674370d9
This is an attempt to support basic and derived counters for navi21. This code will not work correctly unless we add navi counters to metrics.xml and gfx_metrics.xml
Change-Id: Ied06a81345a6fbb02fa0fde1889d94bbe64e9a03
Use hsa header files from /opt/rocm-ver/include rather than using wrapper files from /opt/rocm-ver/hsa/include/hsa
Change-Id: Id7a9bde19447cd2a0fd6e03b11c08471f09c2a46
Fixed exception thrown when ROCP_HSA_INTERCEPT not set or set to 0;
Fixed ROCM hsa_init() failed with error 4096 when trying to read hardware performance counters;
Fixed LD_LIBRARY_PATH to include necessary library;
Change-Id: Idcb7ff807a79f4267374c34041d3bca33d85f532
Changed derived metrics to double from int64.
Fixed standalone test due to int64 to float change
Fixed intercept test due to int64 to float change.
Change-Id: I49631c187406ae9dd94a869b3bb13772012e8cdf
Instead of detecting files (header/library), use cmake's find_package to
locate the required dependencies (hsa-runtime64 and hsakmt).
Adding hsa-runtime64::hsa-runtime64 and hsakmt::hsakmt to the
target_link_libraries also takes care of adding the interfaces include
directories to the search path.
Change-Id: I64eb77c97dac7982ac96d3158ad57df776cc0b53
L2 flush is triggered by explicit cache flush PM4 packet in aqlprofile
packets to GPU. This cache flush is used to sync up CPU and GPU to make
sure perfomance counters copied to profile output buffer is visible to
CPU. To get rid of this cache flush the followings are done:
1. This explicit cache flush packet is removed from aqlprofile code
(another commit to aqlprofile code).
2. This commit which changed profile output buffer to use kernarg
memory since it is uncached for GPU.
After these changes profile counter values when copied by GPU to output
buffer they are guaranteed to be visible to CPU.
Change-Id: Ie953949c85fbee2f4369f1de966bcfb33daec084
On Ubuntu 20.04, in Release mode, gcc fails with this error:
In file included from /usr/include/string.h:495,
from /opt/rocm/include/hsa/hsa_api_trace.h:57,
from ../rocprofiler/src/util/hsa_rsrc_factory.h:29,
from ../rocprofiler/src/util/hsa_rsrc_factory.cpp:25:
In function ‘char* strncpy(char*, const char*, size_t)’,
inlined from ‘const util::AgentInfo* util::HsaRsrcFactory::AddAgentInfo(hsa_agent_t)’ at ../rocprofiler/src/util/hsa_rsrc_factory.cpp:323:12:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:106:34: error: ‘char* __builtin___strncpy_chk(char*, const char*, long unsigned int, long unsigned int)’ specified bound depends on the length of the source argument [-Werror=stringop-overflow=]
106 | return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
| ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../rocprofiler/src/util/hsa_rsrc_factory.cpp: In member function ‘const util::AgentInfo* util::HsaRsrcFactory::AddAgentInfo(hsa_agent_t)’:
../rocprofiler/src/util/hsa_rsrc_factory.cpp:322:39: note: length computed here
322 | const int gfxip_label_len = strlen(agent_info->name) - 2;
| ~~~~~~^~~~~~~~~~~~~~~~~~
The error is caused by the following 2 lines:
const int gfxip_label_len = strlen(agent_info->name) - 2;
strncpy(agent_info->gfxip, agent_info->name, gfxip_label_len);
The size argument to strncpy should not depend on the input string.
Since the terminating character is not considered (the copy is at
most len - 2 bytes), using memcpy is preferable. Also, make sure
the destination does not overflow by clamping the size.
Change-Id: I0c5cf7e0daf4cd6fcf7092efb1d9fd4c02a6c639
Concurrent profiling relies on the aqlprofile read_api
and tracker. This patch set those options to enable
the concurrent profiling.
Change-Id: Ib97d4d8facfbc11f2684d83109397cd13f117d5e
This patch adds barrier packets, together with extra signals,
to enforce the completion order of read packets w.r.t dispatch.
And, PmcStopper is added to stop the profiling finally.
Change-Id: I8e8d3a41d86e42be1d9e5afd44c247be876cf1a5
The profiling was only enabled in serial mode, i.e., kernels
are serialized in execution, and counters are reset at each
kernel start and read at kernel completion. This patch adds
the concurrent mode, by issuing the process-level start
packet to reset counters, and then reading twice at kernel
start and end time to obtain the counter value difference.
The new concurrent profiling usage needs the integration
with the corresponding augment at aqlprofile side.
Change-Id: I94b4442eadc8c64b8fba51b1e4916fc8b895ad21