Wykres commitów

390 Commity

Autor SHA1 Wiadomość Data
Adam Pryor 9425a2f687 [SWDEV-569427] Fix segfault calling bad page info (#2547) 2026-01-13 09:44:49 -06:00
Yazen AL Musaffar d8a914d8cc comment update for wrong units associated with RDC (#2299)
* comment update for wrong units associted with RDC_FI_GPU_MEMORY_CUR_BANDWIDTH

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>

* Update rdc.h

---------

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>
2026-01-08 12:14:51 -06:00
Adam Pryor 5bf6e366dd [SWDEV-548460] Add RDC Policy Reset Message (#2180)
* [SWDEV-548460] Add RDC Policy Reset Message

* [rdc] Bump version to 1.3.0

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

* chore: [rdc] Format CMakeLists.txt

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

---------

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Co-authored-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-12-29 08:31:13 -08:00
Adam Pryor bd6c6852fc [SWDEV-566924] Update KFD_ID metric to use amd-smi instead of rocprof (#2355) 2025-12-18 08:39:19 -06:00
Benjamin Welton e3c051d9b8 [RDC] Optimize RDC counter sampling with greedy packing algorithm (#1590)
* Optimize RDC counter sampling with greedy packing algorithm

This change significantly reduces the number of rocprofiler-sdk sample calls
by implementing a greedy packing algorithm that groups multiple counters into
the minimal number of hardware profiles.

Key improvements:
- Implement greedy packing algorithm to combine counters into minimal profiles
- Add ProfileSet structure to manage packed counter configurations
- Cache packed profile sets for reuse across queries
- Group telemetry field requests by GPU for bulk processing
- Reduce sample calls by ~35% (from 100 to 65 for typical workloads)

Performance impact:
- 13 counters now packed into 3 profiles (77% compression)
- Reduces overhead from profile creation and context switching
- More efficient utilization of hardware counter resources

Implementation details:
- Added create_profiles_for_counters() using greedy algorithm
- Added sample_counters_with_packing() for bulk sampling
- Modified telemetry layer to use rocp_lookup_bulk()
- Preserves all field transformations and special handling

Testing shows successful packing with expected performance gains.
No functional changes to external APIs or behavior.

Co-Authored-By: Ben Welton <bwelton@amd.com>

* Address PR review feedback

This commit addresses all review comments from the initial PR:

1. Fix division by zero risk in debug logging
   - Added check for empty counters vector before calculating compression ratio
   - Avoids potential division by zero when logging profile creation stats

2. Improve thread safety for statistics tracking
   - Changed static uint64_t to std::atomic<uint64_t> for thread-safe counters
   - Prevents race conditions in multi-threaded sampling scenarios

3. Remove unused variable
   - Removed unused profile_index variable that was incremented but never used
   - Cleaned up dead code

4. Clean up code formatting
   - Removed extra blank lines for consistency
   - Applied formatting fixes across modified files

5. Refactor code duplication between rocp_lookup and rocp_lookup_bulk
   - Created apply_field_transformation() helper function
   - Eliminates ~70 lines of duplicated switch statement logic
   - Centralizes field transformation logic in single location
   - Makes future maintenance easier

6. Document non-rocprofiler metrics handling
   - Added comments explaining how bulk lookup handles special cases
   - Clarifies that non-profiler fields like KFD_ID are handled in transformation

All changes maintain backward compatibility and pass compilation.

Co-Authored-By: Ben Welton <bwelton@amd.com>

---------

Co-authored-by: Ben Welton <bwelton@amd.com>
Co-authored-by: Adam Pryor <61172547+adam360x@users.noreply.github.com>
2025-12-17 07:56:33 -06:00
Yazen AL Musaffar 277072f241 Fix for unexpected behavior by ECC_UNCORRECT field (#1088) 2025-12-08 12:07:00 -06:00
Yazen AL Musaffar 16b9160034 [RDC] [SWDEV-551280] RDC to include Error Counters (#1087)
* rdc error counter

* RDC error counters

* fix

* Updates

* updated field names

Signed-off-by: yalmusaf_amdeng <yalmusaf@amd.com>

---------

Signed-off-by: yalmusaf_amdeng <yalmusaf@amd.com>
Co-authored-by: yalmusaf_amdeng <yalmusaf@amd.com>
2025-12-03 15:22:18 -06:00
Yazen AL Musaffar c0d773c47b Fix for created rdc groups not listing when running rdci dmon & rdci group -l -u (#1983)
Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>
2025-12-03 15:21:17 -06:00
jonatluu 6b8aae3796 Enable Lintian Support rocm-systems (#1578)
* draft testing fix for no copyright file and no changelog

* test fix no-changelog no-copyright

* changelog copyright fixt

* remove utils.cmake

* rocr lintian

* lintian overrides, copyright, changelog install

* fix lintian overrides install

* comp_type static fix and remove debug logs

* syntax error

* update static build check

* update file permissions to 0755 to fix error control-file-has-bad-permissions 0664 != 0755

* fix lintian errors in rdc and remove logs from roctracer

* lintian error fix rocprofiler

* fix lintian error

* mmove lintian overrides install

* lintian errors fix

* move lintian overrides install

* use changelog already provided by rdc

* fix formatting use existing changelog if provided

* fix formatting use changelog in rocprofiler

* draft testing fix for no copyright file and no changelog

* test fix no-changelog no-copyright

* changelog copyright fixt

* lintian overrides, copyright, changelog install

* fix lintian overrides install

* comp_type static fix and remove debug logs

* fix lintian errors in rdc and remove logs from roctracer

* lintian error fix rocprofiler

* fix lintian error

* mmove lintian overrides install

* lintian errors fix

* move lintian overrides install

* use changelog already provided by rdc

* fix formatting use existing changelog if provided

* fix formatting use changelog in rocprofiler

* remove overrides. Use existing changelog and copyright

* resolve merge conflict

* update license for hsa-rocr. Use NCSA license

* install license

* install license
2025-11-20 11:38:39 -05:00
Swati Rawat cb257ab9f7 [rdc] Replace readme link rdc -> rocm-systems/projects/rdc (#1758)
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-11-14 13:19:26 +01:00
Dmitrii a2cff3c84d [RDC] Fix GPU_COUNT metric to only count GPUs (#1453)
* [RDC] Fix GPU_COUNT metric to only count GPUs
* [RDC] Clean up float->double casts

---------

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-10-30 12:50:47 -05:00
Dmitrii e0ec72ccdd [rdc] Bump rocprofiler-sdk requirement to 1.1.0 (#1610)
Fixes RDC builds broken by #1563
2025-10-30 10:06:45 -04:00
Dmitrii 0575606e49 chore: [rdc] Add copyright notice (#1098) 2025-09-24 09:07:20 -07:00
Swati Rawat 4d74be5d55 Update install.rst (#1035) 2025-09-18 11:51:36 -04:00
Maisam Arif a29e1d08d4 Remove SLES SP (#891) 2025-09-17 13:18:04 -04:00
Dmitrii 8abe24d3b0 rdc: Add CPU support and CPU metrics infrastructure (#770) 2025-09-12 16:14:38 -05:00
Joseph Macaranas 696881ae82 LICENSE clean up (#919)
- Clean up and standardization of MIT licenses after discussion with legal team.
- Update README.md with blurb for top-level files.
- MIT License explicitly mentioned for relevant projects.
- Removal of years.
- Copyright attribution should be to `Advanced Micro Devices, Inc.` and not `AMD ROCm(TM) Software`
- Removal of `All rights reserved.`
- Reduce line width of the text for readability.
- Add clear visual separators for additional licenses.
- Convert text files to markdown format for aforementioned separators.
- Update build scripts to point to renamed files.
- Fixed SMI doc references

Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-09-10 12:06:14 -04:00
Dmitrii a2d3f4a0e0 rdc: Profiler - improve metrics path detection (#333)
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-08-13 12:33:17 -05:00
srawat 954fd3318e Update conf.py
[ROCm/rdc commit: e3eb0f71b1]
2025-08-05 20:08:07 -05:00
Galantsev, Dmitrii 2785dc21ec Update changelog for 7.0 release
Co-authored-by: Rawat, Swati <Swati.Rawat@amd.com>


[ROCm/rdc commit: 394c634e42]
2025-08-05 20:07:23 -05:00
Galantsev, Dmitrii 2d41f97290 Bump version to 1.2.0
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 543543ff1b]
2025-08-05 20:06:12 -05:00
Galantsev, Dmitrii b574154bce CI - Use LSTT machine to enable labeling
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 31dfc0fcce]
2025-07-30 17:15:31 -05:00
Galantsev, Dmitrii 45e62ada3d Profiler - Add metrics location
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 0a19a7ffc1]
2025-07-30 16:59:44 -05:00
Galantsev, Dmitrii 907f52629c CI - Disable RVS to build rocprofiler from staging
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 6cd870e3b5]
2025-07-30 16:36:55 -05:00
Galantsev, Dmitrii 758adbc1a3 Profiler - Update counter definitions to match changed api
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 8f3a232613]
2025-07-23 23:27:04 -05:00
Galantsev, Dmitrii 213ccc7e72 RVS - Fix iet_stress by disabling logging
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 3f2f92a37a]
2025-07-22 16:02:14 -05:00
Galantsev, Dmitrii 8fc1d27ecd Profiler - Remove UUID metric
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 059451d48f]
2025-07-22 14:55:28 -05:00
Pryor, Adam bf01498af7 [SWDEV-541958] Fix config (#217)
* [SWDEV-541958] Fix config

Change-Id: I6703821747ade5adb993ab7f386f3658db8a3357

* fixes

Change-Id: I0a1c7d96452d9b2ccb6401b77d73398a67518e91

[ROCm/rdc commit: 6a356e7bb1]
2025-07-21 15:05:49 -05:00
Luca Bruni ef65d48149 Add missing header inclusion for C builds
[ROCm/rdc commit: 5ae7eeb355]
2025-07-18 12:58:47 -05:00
Galantsev, Dmitrii cccfe3e0f1 README - Add libcap-dev dependency
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: c401a6bed6]
2025-07-18 12:51:55 -05:00
Galantsev, Dmitrii ad1021d830 FORMAT - Bump gersemi version
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: a1a3e304ba]
2025-07-18 12:47:21 -05:00
Galantsev, Dmitrii f4801e4d25 FORMAT - Use official clang-format repo for pre-commit
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: b136d290e7]
2025-07-18 12:47:21 -05:00
Pryor, Adam 010ac416b1 [SWDEV-379269] Add all gpus as default to dmon (#211)
Change-Id: Idb17e9018c39479830a4366f2002d02725d66873

Signed-off-by: adapryor <Adam.pryor@amd.com>

[ROCm/rdc commit: 816f7a850f]
2025-07-15 16:03:28 -05:00
Pryor, Adam 07346922f5 Adam/bill cleanup (#209)
Co-authored-by: Bill(Shuzhou) Liu <shuzhou.liu@amd.com>


[ROCm/rdc commit: ca9d8c4bae]
2025-07-07 15:41:22 -05:00
Jonathan Luu a03fbdd66a SWDEV-531400 Remove file reorganization backwards compatibility (rdc)
[ROCm/rdc commit: 463d6b60d5]
2025-06-30 15:20:26 -07:00
Galantsev, Dmitrii 78fa862c48 fixup! CHORE - Ignore gersemi commit
[ROCm/rdc commit: 035cd2c371]
2025-06-27 17:34:28 -05:00
Galantsev, Dmitrii 3a762a63fb CI - Fix cmake-format
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: f6023092c7]
2025-06-27 17:25:51 -05:00
Galantsev, Dmitrii 5342e6cf22 CHORE - Ignore gersemi commit
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: a57edc0598]
2025-06-27 17:25:51 -05:00
Galantsev, Dmitrii 1d55c1d820 CMAKE - Format with gersemi
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 40545dcb49]
2025-06-27 17:25:51 -05:00
Galantsev, Dmitrii 6b39188f89 CMAKE - Add gersemi to replace cmake-format
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 680b7a8dd8]
2025-06-27 17:25:51 -05:00
Galantsev, Dmitrii 89a495e493 Profiler - Remove rocprofiler-v1 remnants
Also force unset HSA_TOOLS_LIB so it doesn't break rocprofiler-sdk

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: e73eaf8115]
2025-06-27 13:15:52 -05:00
Galantsev, Dmitrii bb0c4b7653 Python - Add entitycodec
Change-Id: I9dc7f5786e2c5ee5f9756cad7cb12387d05982ae
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: cae49cf4f7]
2025-06-24 17:01:43 -05:00
Galantsev, Dmitrii 5151fe9649 CMAKE - CONFIGURE -> CONFIG
Change-Id: I716f713363469091e944bdda5ecd6886a3a43aa1
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 502fcef7b3]
2025-06-24 17:01:43 -05:00
Pryor, Adam d075194597 [SWDEV-531379] Fix config (#183)
* [SWDEV-531379] Fix config

Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: Ie1bd6903235016a185dd93fbac0a87658fb12a62

* Fix group field find

Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: I1f8c62615327df4b5ca916b158b4882a3d5a59d0

* fixes

Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: I971f3e12e293ea9e5d4d67db64d8d7217b87561c

---------

Signed-off-by: adapryor <Adam.pryor@amd.com>

[ROCm/rdc commit: 8663702737]
2025-06-09 13:55:15 -05:00
Galantsev, Dmitrii ad14980e9a Profiler - Add partition support
NOTE: GPU ordering used is not the same as in HSA/HIP.

GPUs are ordered via amdsmi and then GPU_ID fields are compared to map
GPU partitions to each other.

Change-Id: If379214f5281d7d5ee98515b3e5ba7affc2e2197
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 85b619b2f0]
2025-06-03 19:34:00 -05:00
Galantsev, Dmitrii a14c15ea28 Profiler - Update to 1.0
Change-Id: Iee6d5e7a87a5eb8eed61adccf6729e4d6a144bf8
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 2adc8f82c6]
2025-06-03 19:34:00 -05:00
Galantsev, Dmitrii 0fe3b50f76 Fix missing #include <array>
Change-Id: Ife8efb2957b177b98dbf7efd60213c18623141c8
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 6d94b767bb]
2025-06-03 19:34:00 -05:00
Pryor, Adam 331f648ba0 RDC Event Process Start/Stop Fix (#193)
Change-Id: Ib68f9909f2a6e0a1e5764298f1012a2bcf7ce1fc

Signed-off-by: adapryor <Adam.pryor@amd.com>

[ROCm/rdc commit: 76e9846bb1]
2025-06-03 18:07:37 -05:00
Pryor, Adam 151b0301f1 [SWDEV-535739] Align RDC with amdsmi 26.0 (#191)
* Align RDC with amdsmi 26.0.0
* Remove RDCI_IOLINK_TYPE_NUMIOLINKTYPES

---------

Signed-off-by: adapryor <Adam.pryor@amd.com>
Change-Id: Ib7f2a22bd9544e0bf74afb1ed8d8f8b79b129b1a

[ROCm/rdc commit: cc7ccf507a]
2025-06-02 18:27:19 -05:00
Maisam Arif 5bf0d39a23 Bump AMD-SMI Version
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I2707585cbe49f8b14f18c679080293bc05a151bd


[ROCm/rdc commit: 16e31aae65]
2025-06-02 18:23:43 -05:00