Commit Graph

171 Commits

Author SHA1 Message Date
gobhardw 602ac83ce7 Fixed Outpt Path and recv_0 for ATT
Change-Id: I94248e217d5af14152be82cbe6095de90a489387
2023-03-09 13:20:36 +00:00
gobhardw 03c305dbd4 Making ATT work with Profilerv2
Change-Id: Ic9334aa80e40faaaf5c1a79ba37dbe52e8d31253
2023-03-09 13:20:35 +00:00
Ammar ELWazir 6dda141e4b GPU ID issue
Before, the GPU IDs were counted starting from zero, now CPU IDs are counted from zero and then GPU IDs from the last CPU_ID+1

Change-Id: I3f815195ad97933e02f249841e53b64b674370d9
2023-03-09 13:20:35 +00:00
Ammar ELWazir 8032adb64f Adding rocprofilerv2
Change-Id: Ic0cc280ba207d2b8f6ccae1cd4ac3184152fc1ad
2023-03-09 13:20:33 +00:00
Ammar ELWazir 553a4c7ee7 GPU Index to use HSA AMD Agent Driver Node ID
Change-Id: Ia814f64419615f1d77fc09fc88f11bbaf75afd45
2022-11-21 14:05:33 -05:00
Kiumars Sabeti b53fd84ade SWDEV-302380: [ROCm QA][Mainline][Navi21] 6 tests are failing in rocprofiler-stg2
This is an attempt to support basic and derived counters for navi21.  This code will not work correctly unless we add navi counters to metrics.xml and gfx_metrics.xml

Change-Id: Ied06a81345a6fbb02fa0fde1889d94bbe64e9a03
2022-08-05 17:31:37 -04:00
Ranjith Ramakrishnan e7eb195924 SWDEV-345870 - Correct include paths for new directory layout
Use hsa header files from /opt/rocm-ver/include rather than using wrapper files from /opt/rocm-ver/hsa/include/hsa

Change-Id: Id7a9bde19447cd2a0fd6e03b11c08471f09c2a46
2022-07-14 16:08:41 -07:00
Mark Laws a11ac0a632 SWDEV-283957 : Emit correct group name for rocprof split metrics
Change-Id: Id096edd03bb5be9c8082296fdb659845f2b9c7a6
2022-03-01 11:16:00 -06:00
Chun Yang a8b5d6cf33 SWDEV-283942 SWDEV-292075
Fixed exception thrown when ROCP_HSA_INTERCEPT not set or set to 0;
Fixed ROCM hsa_init() failed with error 4096 when trying to read hardware performance counters;
Fixed LD_LIBRARY_PATH to include necessary library;

Change-Id: Idcb7ff807a79f4267374c34041d3bca33d85f532
2021-10-05 10:44:26 -04:00
Chun Yang 2519d00c17 Fixed corrupted multithread map handling
Change-Id: Ib7d33a4b7f3306b7195ff89c28b021fb1fa6bc88
2021-10-04 20:06:32 -04:00
Chun Yang f9017cbdc5 SWDEV-296922 : Incorrect rounding due to integer division in rocprofiler metrics
Changed derived metrics to double from int64.
Fixed standalone test due to int64 to float change
Fixed intercept test due to int64 to float change.

Change-Id: I49631c187406ae9dd94a869b3bb13772012e8cdf
2021-09-23 14:52:35 -07:00
Laurent Morichetti acb246f788 Get ROCr and ROCt dependencies using cmake targets
Instead of detecting files (header/library), use cmake's find_package to
locate the required dependencies (hsa-runtime64 and hsakmt).

Adding hsa-runtime64::hsa-runtime64 and hsakmt::hsakmt to the
target_link_libraries also takes care of adding the interfaces include
directories to the search path.

Change-Id: I64eb77c97dac7982ac96d3158ad57df776cc0b53
2021-09-14 14:49:32 -07:00
Chun Yang 2b79931631 SWDEV-301543 SWDEV-276146 : Fix profile output buff allocation
L2 flush is triggered by explicit cache flush PM4 packet in aqlprofile
packets to GPU. This cache flush is used to sync up CPU and GPU to make
sure perfomance counters copied to profile output buffer is visible to
CPU. To get rid of this cache flush the followings are done:
  1. This explicit cache flush packet is removed from aqlprofile code
     (another commit to aqlprofile code).
  2. This commit which changed profile output buffer to use kernarg
     memory since it is uncached for GPU.
After these changes profile counter values when copied by GPU to output
buffer they are guaranteed to be visible to CPU.

Change-Id: Ie953949c85fbee2f4369f1de966bcfb33daec084
2021-09-02 17:30:57 -07:00
Chun Yang 55fdd451f3 Change obj_map_ from pointer to object
Change-Id: Ibc2fb8812c34b44d7b59275f2850bb127b9def7c
2021-07-28 11:52:44 -07:00
AMD 4df3e0bd9a Add support for gfx90a
Merge gfx90a support from the 'amd-npi' branch.

Change-Id: I9b51711ed4a1d2f1ed42ba9b83cb12136be228b8
2021-06-16 16:35:42 -07:00
Chun Yang 6da2b19562 SWDEV-283942 : Fixed false error report from rocprofiler
Change-Id: Ifc6eb0cb26f60a5596e1b626a578135ae9080f26
2021-05-17 14:16:50 -07:00
Rachida Kebichi a2d89f22a7 Fixed order of code obj and symbols processing
Change-Id: Icb3341e54f3e0c7cf3da06811712f001e213d83d
2021-04-22 20:27:26 -04:00
Laurent Morichetti 304d3366af Fix a compilation error with gcc-9.3.0
On Ubuntu 20.04, in Release mode, gcc fails with this error:

In file included from /usr/include/string.h:495,
                 from /opt/rocm/include/hsa/hsa_api_trace.h:57,
                 from ../rocprofiler/src/util/hsa_rsrc_factory.h:29,
                 from ../rocprofiler/src/util/hsa_rsrc_factory.cpp:25:
In function ‘char* strncpy(char*, const char*, size_t)’,
    inlined from ‘const util::AgentInfo* util::HsaRsrcFactory::AddAgentInfo(hsa_agent_t)’ at ../rocprofiler/src/util/hsa_rsrc_factory.cpp:323:12:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:106:34: error: ‘char* __builtin___strncpy_chk(char*, const char*, long unsigned int, long unsigned int)’ specified bound depends on the length of the source argument [-Werror=stringop-overflow=]
  106 |   return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
      |          ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../rocprofiler/src/util/hsa_rsrc_factory.cpp: In member function ‘const util::AgentInfo* util::HsaRsrcFactory::AddAgentInfo(hsa_agent_t)’:
../rocprofiler/src/util/hsa_rsrc_factory.cpp:322:39: note: length computed here
  322 |     const int gfxip_label_len = strlen(agent_info->name) - 2;
      |                                 ~~~~~~^~~~~~~~~~~~~~~~~~

The error is caused by the following 2 lines:

    const int gfxip_label_len = strlen(agent_info->name) - 2;
    strncpy(agent_info->gfxip, agent_info->name, gfxip_label_len);

The size argument to strncpy should not depend on the input string.

Since the terminating character is not considered (the copy is at
most len - 2 bytes), using memcpy is preferable. Also, make sure
the destination does not overflow by clamping the size.

Change-Id: I0c5cf7e0daf4cd6fcf7092efb1d9fd4c02a6c639
2021-04-22 11:12:53 -07:00
Evgeny c4828a9de0 concurrent: enable PmcStopper to end perf counting.
Change-Id: I89785277678141e29349e162df10203787050643
2021-04-09 08:06:42 +00:00
Evgeny 64bdcaddc7 fixing gfx10 gfxip name
Change-Id: Ie58768d64117a616b1896489b505790cfa993054
2021-03-24 00:48:21 -05:00
Evgeny e2c9d13e5b SWDEV-274821 SPM initialization fix
Change-Id: I5e27928a60083eff328bab3e79937ce11bce11bd
2021-03-22 09:18:36 +00:00
Evgeny fb82ddee81 adding GPU command functions module
Change-Id: Id2c2d82ea6fee42695309ad3bb296effa77a2f33
2021-02-03 04:45:59 -05:00
Evgeny 96ff7582ce porting of AQL packet submit to new atomic HSA queue API
Change-Id: I654448a7a8627978395d426118a5cb3ba2a92058
2020-10-12 09:26:27 -05:00
Evgeny 97caab40da SWDEV-255459 : to get rid of c++ libatomic
Change-Id: I311db0e456dd6e6c87692898640574dc8f669086
2020-10-09 07:39:43 -04:00
Evgeny 169e36f379 SWDEV-252747 : testing using v3 object
Change-Id: I427df765d1be55bd2851ce441238b3eaa46cca4f
2020-10-09 06:38:46 -04:00
Evgeny 0d164ba672 enable contexts wait
Change-Id: Ie2adf04662fddc8051fb5418904c9c659e264d78
2020-09-21 21:06:03 -04:00
Evgeny 8850e46071 kernel objects dumping
Change-Id: I5a16e05b7df438efa903948701b65a9ced99e5f3

initial codeobj event implementation

Change-Id: Ia7fac3c2b9897a004cfe88c4de82ba8c18284196

update - codeobj event implementation

Change-Id: I2b91b6e689875af03f0086f5a0872a97a629fd83

update2 - codeobj event implementation

Change-Id: Icff75f14fd21963e40db95373fa74880957a9e32

fix - codeobj event implementation

Change-Id: I76c33c875cb429fb12a974bb408b217f187b4536

URI buffer fix - codeobj event implementation

Change-Id: I7ce1a758e021455da3fe5b8a6e4ae3ab46e9760e

HSA events exposing

Change-Id: I3664ab4e5111c4ccedaf068dcb19f48055f0ef9b

HSA events data struct normalizing

Change-Id: I365ef0db45e0a9314bd2a1a4d29dd4eb4e91297d
2020-09-11 10:01:54 -05:00
Evgeny Shcherbakov 8263eceef9 Merge "concurrent: enable/fix the related settings" into amd-staging 2020-09-01 17:43:08 -04:00
Evgeny 858e0b0f8a HSA trace kernel name demangling
Change-Id: I6d8b674137405a93939c38d7e615af5a114f04ca
2020-08-28 05:40:17 -05:00
Xianwei Zhang b445610cd1 concurrent: enable/fix the related settings
Concurrent profiling relies on the aqlprofile read_api
and tracker. This patch set those options to enable
the concurrent profiling.

Change-Id: Ib97d4d8facfbc11f2684d83109397cd13f117d5e
2020-08-26 16:04:57 -04:00
Xianwei Zhang e26210d9d9 concurrent: improve concurrent profiling
This patch adds barrier packets, together with extra signals,
to enforce the completion order of read packets w.r.t dispatch.
And, PmcStopper is added to stop the profiling finally.

Change-Id: I8e8d3a41d86e42be1d9e5afd44c247be876cf1a5
2020-08-05 18:20:14 -04:00
Evgeny 80747de208 optimization mechanism fix: correct tracker handler; kernel name query on completion;
Change-Id: I14da152b4ac3c7d8fd1af2f54e9d71f834071622
2020-08-03 23:34:49 -05:00
Evgeny 8bb860f841 return value fix
Change-Id: Id23dc2cf7f25efbf778a853403e43dd1176d5e33
2020-07-21 01:00:41 -05:00
Xianwei Zhang 61c9df4631 pmc: add support of concurrent kernel profiling
The profiling was only enabled in serial mode, i.e., kernels
are serialized in execution, and counters are reset at each
kernel start and read at kernel completion. This patch adds
the concurrent mode, by issuing the process-level start
packet to reset counters, and then reading twice at kernel
start and end time to obtain the counter value difference.
The new concurrent profiling usage needs the integration
with the corresponding augment at aqlprofile side.

Change-Id: I94b4442eadc8c64b8fba51b1e4916fc8b895ad21
2020-07-16 14:39:46 -05:00
Evgeny 2a7f77b290 counters dumping optimization
Change-Id: I8c694e5380e15179453148dd9ab3a3e51b6db861
2020-07-15 09:57:41 -05:00
Evgeny 9f7e936d70 concurrent sqtt support
Change-Id: I91391fafabc93aefa5d244d870ef82b96a59dc52
2020-06-23 20:00:49 -04:00
Evgeny 30db99e758 setting code-obj tracking by default
Change-Id: Id6a97a7dc77faa3b7eb0e2b81b75c13ca7fc5818
2020-05-28 03:43:12 -05:00
Evgeny Shcherbakov 48c8076e9c Merge "clang10 proting" into amd-master 2020-05-12 22:57:32 -04:00
Evgeny 04aea8c3df clang10 proting
Change-Id: I071833f9d1f46df105f7ef1749c5d17d989bbb05
2020-05-12 18:26:47 -05:00
Evgeny 9950b97567 disabling destruction
Change-Id: I2a7d05a8f597b3bc8bd07bffe7181f9dcace1cbf
2020-05-07 03:34:30 -05:00
Evgeny 3af87a7423 adding pid for kernel results to support multi-process profiling
Change-Id: I283228a4b4145599c5e637dd6faa771b9f4b6345
2020-05-05 05:35:32 -05:00
Evgeny 3ce98d33d4 get_time API: make public; extende with more time id: coarse and raw; added time error return value;
Change-Id: I1641eb2c38915222204617e07fc0bfb388bb8346
2020-04-30 02:38:18 -05:00
Evgeny fe70682184 toll destruction fix
Change-Id: If069c820526e21a0a4b80ac516f9669a81f34cab
2020-04-28 03:16:15 -05:00
Evgeny f819e1c463 elliminaring the need of AMD_INTERNAL_BUILD macro defined
Change-Id: Ie97aef943793b1e4f40b7c7397af313520b35beb
2020-04-09 23:41:51 -05:00
Evgeny fdb8f55e02 adding standalone intercept test; queeu_start/stop API fixed as public;
Change-Id: I5489a5ff69454985b955c9e4027f812168de1ecb
2020-04-08 04:31:52 -05:00
Evgeny 7be9a42ab3 fixing hsa intercept test
Change-Id: I2671dfc6a9bd3e01a0c926aa3ea367b8c7a0279e
2020-03-28 17:24:16 -05:00
Evgeny 9df9fddcfb PC sampling bringup
Change-Id: I0d041c4c8c3778f2c328cde38432bc72223706a3

pc sampling integration fix

Change-Id: Ia66ff876d2d99ec4d561daf8320b65d75f5cd2fe
2020-03-28 13:07:45 -05:00
Evgeny 2dacdd041d clang compilation fix
Change-Id: I4fb4625407faade8ee72c9fe7d0176991e772dde
2020-03-24 15:40:10 -05:00
Evgeny a5f52b40f5 JSON kernel name propogation and stats
Change-Id: I60cf4c7608272941e2499bd251850416ac254f32
2020-02-26 19:45:49 -06:00
Evgeny 40730e34e4 adding AgentInfo::lds_block_size
Change-Id: I186893add96dc92570e710ae78b475897ebfe531
2020-02-18 14:00:19 -06:00