Граф коммитов

208 Коммитов

Автор SHA1 Сообщение Дата
Ammar ELWazir 1f9927abf7 Fixing CMakeLists.txt
Change-Id: Iea14571f23fef5e683a59afa4d41ee3f63f6df46
2023-06-28 23:50:42 -04:00
Ammar ELWazir d8834eb370 Fixing Test Packages
Change-Id: I1c017732ce1dedfe8b74d680add101ca574d295c
2023-06-28 08:34:01 -04:00
Saurabh Verma b7d045c672 SWDEV-400688: Correction for block instance count referenced in xml for MI300 metrics
Change-Id: I8b84f5d018d64104ed3d1bedeff272fd5e7437ca
2023-06-21 16:26:59 -04:00
Ammar ELWazir 472624e3bd SWDEV-374256: GPU Kernel Dispatch Trace Period Support
Change-Id: Idaabe82a30013e3aba4bcb65bd0a89ce2d14ad97
2023-06-21 12:46:33 -04:00
Giovanni LB a1508035dc SWDEV-298742: Added occupancy metrics
Change-Id: I67e375ad06535bbb8cc864b78840ce3962bcc58e
2023-06-19 12:10:22 -04:00
Giovanni LB e1285e3fd4 SWDEV-405575: Added gfx941 and gfx942
Change-Id: I45a49cd64a76d3ae32c209497c70fe27b5be212b
2023-06-19 11:11:37 -04:00
Ammar ELWazir 6c61aa5311 Fixes for Spack
Change-Id: Ib2ea41b8140589fbc74aa297379588cc720e0183
2023-06-13 12:02:58 -04:00
Ammar ELWazir 19e3253049 Fixing ROCProfiler V1 Tests & V1 Tests Packaging
Change-Id: I741e29b8fbce9d4643c4f13afafd7d4fd648094b
2023-06-08 18:25:57 +00:00
Saurabh Verma 8f82ff6a46 MI300 counters support for rocprof and rocprofv2(Accumulation from all xccs)
1. Xml files updated for gfx940 counters
2. File plugin changes to allow rocprofv2 backward compatibility for results.csv
3. Changes in rocprofv2 script to use tblextr.py, to generate results.csv just like rocprof

Change-Id: I7798f4411ce01f6fbfffb126de654ed806ca7045
(cherry picked from commit 86cbaf38c436be876f0426fa27803b1e64d90378)
2023-05-30 21:41:54 -05:00
gobhardw 70a6c26704 fixing ci test issues for v1
Change-Id: I6be62c83a04b6a1a9f7b128086762dcf5ad79fb4
2023-05-17 21:32:12 -04:00
Kiumars Sabeti 997c771723 SWDEV-380635: adding gfx11 architecture to rocprofiler which includes navi31 and navi32 for now
Change-Id: Ib2a93a34688471c82b5db0dc10e8da58452dba21
2023-05-05 15:39:18 -04:00
Ammar ELWazir 9e62e066fe V1/V2 API Library Separation
V1 library will be supported as librocprofiler64.so and V2 will be supported as librocprofiler64v2.so and headers will be rocprofiler.h for V1 and v2/rocprofiler.h for v2

Change-Id: Ibe5bdbf2f79f0175342c648e917ae77918186604
2023-05-02 22:44:43 -04:00
gobhardw 14977e4dc1 SWDEV-374072 : rocprof gpu selector fix
Change-Id: I155e63a5dc1ecbacd76d80b0df76da99b645ed9f
2023-03-29 15:55:06 +00:00
Kiumars Sabeti a9f1237c53 SWDEV-387039: Modified gfx90a section to inherit from gfx9 base and removed derived counters that are defined in the gfx9 base from gfx90a section to avoid duplication
Change-Id: I653e116bc47fe11b57e663c2827d177149b00c5b
2023-03-29 15:55:06 +00:00
Ammar ELWazir ceefad27d0 Solving failed tests for rocprofiler v1
Change-Id: I61ffc4380b077db3a23c9dbb3e680324cf7f1a4a
2023-03-09 13:21:08 +00:00
Ammar ELWazir 8032adb64f Adding rocprofilerv2
Change-Id: Ic0cc280ba207d2b8f6ccae1cd4ac3184152fc1ad
2023-03-09 13:20:33 +00:00
Saurabh Verma 225bddf148 Adding missing MI200 metrics
Change-Id: I410f50e03d38bb03cf43e743318eb1242e7d6518
2023-01-11 18:00:46 +00:00
Kiumars Sabeti a9a82ee107 SWDEV-369023: Added two new counters SQ_INSTS_TEX_LOAD and SQ_INSTS_TEX_STORE for gfx10.These two new counters are replacement for SQ_INSTS_VMEM_RD and SQ_INSTS_VMEM_WR which are not supported in gfx10 architecture
Change-Id: I4c4101eea27f9073492ae42c70a30a002f4d8834
2022-12-09 20:41:45 -05:00
Ammar ELWazir 553a4c7ee7 GPU Index to use HSA AMD Agent Driver Node ID
Change-Id: Ia814f64419615f1d77fc09fc88f11bbaf75afd45
2022-11-21 14:05:33 -05:00
Kiumars Sabeti b53fd84ade SWDEV-302380: [ROCm QA][Mainline][Navi21] 6 tests are failing in rocprofiler-stg2
This is an attempt to support basic and derived counters for navi21.  This code will not work correctly unless we add navi counters to metrics.xml and gfx_metrics.xml

Change-Id: Ied06a81345a6fbb02fa0fde1889d94bbe64e9a03
2022-08-05 17:31:37 -04:00
Laurent Morichetti 5fd1c7e8e3 Fix vgpr count calculation for gfx90a and gfx940
Read accum_offset from compute_pgm_rsrc3 to report both the arch vgprs
and the accum vgprs

Change-Id: I99e746d54a6a1671e343da5658cc6ce970f79939
2022-08-03 14:02:36 -07:00
Saurabh Verma 18dedbaee8 SWDEV-297195: Corrected units for some counters. Units changed to quad-cycles units where required.
Change-Id: Ia6b0387ac6ec4210bb9482d85ae5635fc7c3c9d0
2022-07-21 17:22:17 -05:00
Ranjith Ramakrishnan e7eb195924 SWDEV-345870 - Correct include paths for new directory layout
Use hsa header files from /opt/rocm-ver/include rather than using wrapper files from /opt/rocm-ver/hsa/include/hsa

Change-Id: Id7a9bde19447cd2a0fd6e03b11c08471f09c2a46
2022-07-14 16:08:41 -07:00
Saurabh Verma 6d233c65d7 SWDEV-298750:Approval to make internal profile counters public
Added approved HW counters for MI200. Also added derived metrics for the same

Change-Id: I1c6abfdfde4e4fd4ba8bd5eec0557ad08fd71c77
2022-05-17 16:44:16 -05:00
Chun Yang 26c479c72a SWDEV-324379 : Expose FP64 and FP32 performance counters on on AMD profilers for MI200
Change-Id: I2c38ccc297872dfc1896314ceadbed98dc761766
2022-03-17 14:06:24 -07:00
Ranjith Ramakrishnan 015697db74 File Reorganization with backward compatibility
Package files installed in /opt/rocm
Wrapper header files and library soft links installed in/opt/rocm/rocprofiler
Test tools library and binaries renamed
Internal binaries installed in /opt/rocm/libexec/rocprofiler
run.sh updated with file reorg changes

Change-Id: I927d1a0dcd814764ebf0f473d0a64883906d5457
2022-03-05 14:49:41 -08:00
Chun Yang a8b5d6cf33 SWDEV-283942 SWDEV-292075
Fixed exception thrown when ROCP_HSA_INTERCEPT not set or set to 0;
Fixed ROCM hsa_init() failed with error 4096 when trying to read hardware performance counters;
Fixed LD_LIBRARY_PATH to include necessary library;

Change-Id: Idcb7ff807a79f4267374c34041d3bca33d85f532
2021-10-05 10:44:26 -04:00
Chun Yang d024c48c56 Changed function param passing from ref to value
Change-Id: I4e5feec09705e4e4bab5f9dcf320fc25a59c0762
2021-09-29 18:46:42 -07:00
Chun Yang f9017cbdc5 SWDEV-296922 : Incorrect rounding due to integer division in rocprofiler metrics
Changed derived metrics to double from int64.
Fixed standalone test due to int64 to float change
Fixed intercept test due to int64 to float change.

Change-Id: I49631c187406ae9dd94a869b3bb13772012e8cdf
2021-09-23 14:52:35 -07:00
Laurent Morichetti acb246f788 Get ROCr and ROCt dependencies using cmake targets
Instead of detecting files (header/library), use cmake's find_package to
locate the required dependencies (hsa-runtime64 and hsakmt).

Adding hsa-runtime64::hsa-runtime64 and hsakmt::hsakmt to the
target_link_libraries also takes care of adding the interfaces include
directories to the search path.

Change-Id: I64eb77c97dac7982ac96d3158ad57df776cc0b53
2021-09-14 14:49:32 -07:00
rachida 312048b38d SWDEV-296154 rocprofiler test suite is failing
Change-Id: Id2b0ade0a475e38ea54671802e16b25d5beabed8
2021-08-11 10:41:26 -04:00
AMD 4df3e0bd9a Add support for gfx90a
Merge gfx90a support from the 'amd-npi' branch.

Change-Id: I9b51711ed4a1d2f1ed42ba9b83cb12136be228b8
2021-06-16 16:35:42 -07:00
Kent Russell 97c9efce38 Cmake: Support static hsakmt
Add numa lib as this will be required with a static thunk
Look for static thunk of shared thunk cannot be found

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Idcaa0c785a0502c9f5fe42e2dfb9e0c1780f9d66
2021-04-27 12:18:02 -04:00
Laurent Morichetti 304d3366af Fix a compilation error with gcc-9.3.0
On Ubuntu 20.04, in Release mode, gcc fails with this error:

In file included from /usr/include/string.h:495,
                 from /opt/rocm/include/hsa/hsa_api_trace.h:57,
                 from ../rocprofiler/src/util/hsa_rsrc_factory.h:29,
                 from ../rocprofiler/src/util/hsa_rsrc_factory.cpp:25:
In function ‘char* strncpy(char*, const char*, size_t)’,
    inlined from ‘const util::AgentInfo* util::HsaRsrcFactory::AddAgentInfo(hsa_agent_t)’ at ../rocprofiler/src/util/hsa_rsrc_factory.cpp:323:12:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:106:34: error: ‘char* __builtin___strncpy_chk(char*, const char*, long unsigned int, long unsigned int)’ specified bound depends on the length of the source argument [-Werror=stringop-overflow=]
  106 |   return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
      |          ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../rocprofiler/src/util/hsa_rsrc_factory.cpp: In member function ‘const util::AgentInfo* util::HsaRsrcFactory::AddAgentInfo(hsa_agent_t)’:
../rocprofiler/src/util/hsa_rsrc_factory.cpp:322:39: note: length computed here
  322 |     const int gfxip_label_len = strlen(agent_info->name) - 2;
      |                                 ~~~~~~^~~~~~~~~~~~~~~~~~

The error is caused by the following 2 lines:

    const int gfxip_label_len = strlen(agent_info->name) - 2;
    strncpy(agent_info->gfxip, agent_info->name, gfxip_label_len);

The size argument to strncpy should not depend on the input string.

Since the terminating character is not considered (the copy is at
most len - 2 bytes), using memcpy is preferable. Also, make sure
the destination does not overflow by clamping the size.

Change-Id: I0c5cf7e0daf4cd6fcf7092efb1d9fd4c02a6c639
2021-04-22 11:12:53 -07:00
Evgeny 780dfa37d4 cleanup after separating for staging and npi branches
Change-Id: Iadd624df21b85f1590e901a8125680743e3281a3
2021-04-08 20:37:47 +00:00
Evgeny 82d7bb2145 SWDEV-265287 : integration spmltgen.py script
Change-Id: Ief3e93225fb6660e72a04e4bd4b379262b73c914
2021-04-08 10:04:39 -04:00
Evgeny 64bdcaddc7 fixing gfx10 gfxip name
Change-Id: Ie58768d64117a616b1896489b505790cfa993054
2021-03-24 00:48:21 -05:00
Evgeny e2c9d13e5b SWDEV-274821 SPM initialization fix
Change-Id: I5e27928a60083eff328bab3e79937ce11bce11bd
2021-03-22 09:18:36 +00:00
Evgeny 7e60bf163e SWDEV-255662 : spm kfd mode support
Change-Id: I840c7e92d3d5a59d8e5402c4d8ef86bc123dd07c
2020-12-02 13:02:45 -06:00
Evgeny f2c9980647 fixing sqtt trace for zero size case
Change-Id: I75712485f518725af46a3b419339a212d1e762a0
2020-12-01 18:19:51 -05:00
Evgeny ccc6005c25 fixing c_str() as strdup
Change-Id: Ib5cb68d16ce66fd2ae072168de4c16895f32b57f
2020-10-27 14:45:51 -05:00
Evgeny 96ff7582ce porting of AQL packet submit to new atomic HSA queue API
Change-Id: I654448a7a8627978395d426118a5cb3ba2a92058
2020-10-12 09:26:27 -05:00
Evgeny 169e36f379 SWDEV-252747 : testing using v3 object
Change-Id: I427df765d1be55bd2851ce441238b3eaa46cca4f
2020-10-09 06:38:46 -04:00
Evgeny 0d164ba672 enable contexts wait
Change-Id: Ie2adf04662fddc8051fb5418904c9c659e264d78
2020-09-21 21:06:03 -04:00
Evgeny 8850e46071 kernel objects dumping
Change-Id: I5a16e05b7df438efa903948701b65a9ced99e5f3

initial codeobj event implementation

Change-Id: Ia7fac3c2b9897a004cfe88c4de82ba8c18284196

update - codeobj event implementation

Change-Id: I2b91b6e689875af03f0086f5a0872a97a629fd83

update2 - codeobj event implementation

Change-Id: Icff75f14fd21963e40db95373fa74880957a9e32

fix - codeobj event implementation

Change-Id: I76c33c875cb429fb12a974bb408b217f187b4536

URI buffer fix - codeobj event implementation

Change-Id: I7ce1a758e021455da3fe5b8a6e4ae3ab46e9760e

HSA events exposing

Change-Id: I3664ab4e5111c4ccedaf068dcb19f48055f0ef9b

HSA events data struct normalizing

Change-Id: I365ef0db45e0a9314bd2a1a4d29dd4eb4e91297d
2020-09-11 10:01:54 -05:00
Xianwei Zhang b445610cd1 concurrent: enable/fix the related settings
Concurrent profiling relies on the aqlprofile read_api
and tracker. This patch set those options to enable
the concurrent profiling.

Change-Id: Ib97d4d8facfbc11f2684d83109397cd13f117d5e
2020-08-26 16:04:57 -04:00
Evgeny 80747de208 optimization mechanism fix: correct tracker handler; kernel name query on completion;
Change-Id: I14da152b4ac3c7d8fd1af2f54e9d71f834071622
2020-08-03 23:34:49 -05:00
Evgeny 7364edcc5b kernel name filtering fix - handling [] brakets
Change-Id: I46a62d991a52045694640837393df229cf7a3133
2020-07-29 18:47:31 -05:00
Xianwei Zhang 61c9df4631 pmc: add support of concurrent kernel profiling
The profiling was only enabled in serial mode, i.e., kernels
are serialized in execution, and counters are reset at each
kernel start and read at kernel completion. This patch adds
the concurrent mode, by issuing the process-level start
packet to reset counters, and then reading twice at kernel
start and end time to obtain the counter value difference.
The new concurrent profiling usage needs the integration
with the corresponding augment at aqlprofile side.

Change-Id: I94b4442eadc8c64b8fba51b1e4916fc8b895ad21
2020-07-16 14:39:46 -05:00
Evgeny 2a7f77b290 counters dumping optimization
Change-Id: I8c694e5380e15179453148dd9ab3a3e51b6db861
2020-07-15 09:57:41 -05:00