* fix: resolve crash when profiling TensorFlow GPU application
* incorporate review comments
* updated min_rows from 3 to 2 for threads table validation as internal threads are not profiled and are now correctly bypassed
## Motivation
Missing CODEOWNERS for ROCProfiler-SDK
<!-- Explain the purpose of this PR and the goals it aims to achieve. -->
## Technical Details
Add CODEOWNERS for rocprofiler-sdk project
<!-- Explain the changes along with any relevant GitHub links. -->
## JIRA ID
<!-- If applicable, mention the JIRA ID resolved by this PR (Example: Resolves SWDEV-12345). -->
<!-- Do not post any JIRA links here. -->
## Test Plan
<!-- Explain any relevant testing done to verify this PR. -->
## Test Result
<!-- Briefly summarize test outcomes. -->
## Submission Checklist
- [X] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
## Add Full Build Capability to theROCK for HIP
### Summary
This PR adds full build support to **theROCK** for HIP-related changes, ensuring that all components are built.
### Changes
- Enabled full build coverage for the following projects:
- `projects/clr`
- `projects/hip`
- `projects/hip-tests`
- `projects/rocr-runtime`
- Updated build configuration to include all targets for the above projects.
- Ensured rocm-libraries is pulled to build optional components.
### Motivation
These changes are required to support HIP development and testing within theROCK by ensuring all components are built together. This improves reliability, integration testing.
* Put cached perfetto traces as default one
* Improve cached data and perfetto traces in order to be more aligned with E2E tests
* Addressing PR comments and findings
* Force early instrumentation bundle instantiation
* Sync-up insturumented containers with thread growth data
* Revert ompvv number of host threads to default 8
* Fixed counter track namings for amd-smi
* AIPROFSYST-34 [rocprof-sys] Update documentation describing newly introduced changes to default tracing mechanism
Currently if the input file name already exists, the tool
appends output to existing file. Added overwrite, append,
or no(discard) options to choose from.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Co-authored-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Install rocm-dev in rocprofiler-compute-tarball.yml workflow
* Update paths for push and PR for rocprofiler-compute-tarball.yml
* Add ROCm dependencies to disttest job
* cmake fix binary link creation and fix format
* Use python3 instead of python3.9 in RHEL 8 and RHEL 9 workflows
* set default python3 to python3.9 in rhel8
* Try alternatives setup for python3 in RHEL8 env
* Add pip install cmake to debug RHEL8 issue
* Remove python3.11 in RHEL8 workflow
* Add back comment regarding RHEL8
---------
Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
* Improve Iteration multiplexing
* Improve iteration multiplexing documentation by adding usage note and
listing caveats
* Bugfixes for iteration mulitplexing
* Use merge iteration multiplexing in analysis webui and db mode
* Do not remove Dispatch_ID column in merge iteration multiplexing
since it is needed for analysis of top dispatches based on
duration
* Bugfixes for analysis logic
* Graceful handling of missing counters in case of iteration
multiplexing
* Improved warnings when metrics could not be calculated due to
missing counter data
* Fix the check to prevent showing table when a column is full of
N/A
* Improve detection of empty values when metric evaludation fails
due to missing counter data
* Bugfixes for profile logic
* Fix kernel filtering during roofline benchmark phase
* Update changelog for bugfixes
* Remove unnecessary columns when merging dispatches for iteration multiplexing
* bugfix
* Better analysis warnings
* fix to_std() in parser
* Use median in merge iteration multiplex
* Address review comments
* Fix cmake formatting
* fix None handling of parser util functions
* Enable stochastic counter accuracy test
* fix cmake formatting
* test: add unit tests for common utilities from PR #1249
* incorporate review comments specific to tests formatting
* use filesystem API instead of std::system for safer cleanup
* Add ghc/filesystem submodule v1.5.14 for portable C++17 filesystem support
* fix: add cmake/GhcFilesystem.cmake for CI submodule auto-checkout
* incorporate review comment
* incorporate review comment
* Add HasExpertSchedMode device prop
* Add unit tests for HasExpertSchedMode
* Add gfx12 check for HasExpertSchedMode prop
* Update gfx major version check and test for ExpertSchedMode
* Minor fix and ROCr version bump
* Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h
* Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h
* Apply suggestion from @dayatsin-amd
* Apply suggestion from @dayatsin-amd
---------
Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
Co-authored-by: David Yat Sin <77975354+dayatsin-amd@users.noreply.github.com>
* Optimize RDC counter sampling with greedy packing algorithm
This change significantly reduces the number of rocprofiler-sdk sample calls
by implementing a greedy packing algorithm that groups multiple counters into
the minimal number of hardware profiles.
Key improvements:
- Implement greedy packing algorithm to combine counters into minimal profiles
- Add ProfileSet structure to manage packed counter configurations
- Cache packed profile sets for reuse across queries
- Group telemetry field requests by GPU for bulk processing
- Reduce sample calls by ~35% (from 100 to 65 for typical workloads)
Performance impact:
- 13 counters now packed into 3 profiles (77% compression)
- Reduces overhead from profile creation and context switching
- More efficient utilization of hardware counter resources
Implementation details:
- Added create_profiles_for_counters() using greedy algorithm
- Added sample_counters_with_packing() for bulk sampling
- Modified telemetry layer to use rocp_lookup_bulk()
- Preserves all field transformations and special handling
Testing shows successful packing with expected performance gains.
No functional changes to external APIs or behavior.
Co-Authored-By: Ben Welton <bwelton@amd.com>
* Address PR review feedback
This commit addresses all review comments from the initial PR:
1. Fix division by zero risk in debug logging
- Added check for empty counters vector before calculating compression ratio
- Avoids potential division by zero when logging profile creation stats
2. Improve thread safety for statistics tracking
- Changed static uint64_t to std::atomic<uint64_t> for thread-safe counters
- Prevents race conditions in multi-threaded sampling scenarios
3. Remove unused variable
- Removed unused profile_index variable that was incremented but never used
- Cleaned up dead code
4. Clean up code formatting
- Removed extra blank lines for consistency
- Applied formatting fixes across modified files
5. Refactor code duplication between rocp_lookup and rocp_lookup_bulk
- Created apply_field_transformation() helper function
- Eliminates ~70 lines of duplicated switch statement logic
- Centralizes field transformation logic in single location
- Makes future maintenance easier
6. Document non-rocprofiler metrics handling
- Added comments explaining how bulk lookup handles special cases
- Clarifies that non-profiler fields like KFD_ID are handled in transformation
All changes maintain backward compatibility and pass compilation.
Co-Authored-By: Ben Welton <bwelton@amd.com>
---------
Co-authored-by: Ben Welton <bwelton@amd.com>
Co-authored-by: Adam Pryor <61172547+adam360x@users.noreply.github.com>
* hsakmt: Expose CWSR and Control stack sizes
This is better than hardcoding values and hoping that they align with
KFD's definitions
Signed-off-by: Kent Russell <kent.russell@amd.com>
* hsakmt: Use CwsrSize and CtlStackSize if available
If KFD is providing the CwsrSize and CtlStackSize, use the maximum
of those and the old calculations for the ctx_save_restore_size
and ctl_stack_size defined in the queue
Signed-off-by: Kent Russell <kent.russell@amd.com>
* hsakmt: Add warning when ABI<1.20 on GFX1151
CwsrSize and CtlStackSize are reported by KFD ABI 1.20. GFX1151
specifically may have some issues if these regions are misaligned, so
report a strong warning during topology initialization if the system is
GFX1151 but is using KFD ABI < 1.20
Signed-off-by: Kent Russell <kent.russell@amd.com>
---------
Signed-off-by: Kent Russell <kent.russell@amd.com>
* Enable Lintian Support for ROCM-SMI
* Enable Lintian Support for ROCMINFO
* Updated Lintian Override File Processing
* Update UT Fix for Lintian rocmsmi,rocminfo
* Update UT Fixes, Review Comments
* Update Review Comments - removed extra white spaces, added error check for gzip, date commands
* Update Review Comments - Correcting License Type
* Sync Lintian ChangeLog
* Changelog data sync enhanced
* Update Review Comments, UT fix
* white space cleanup - precommit check
* Run pre-commit's whitespace related hooks on projects/amdsmi
In order for pre-commit to be useful, everything needs to meet a common
baseline.
* Add whitespace back to Changelog for formatting
---------
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Ignore __CLANG_HIP_RUNTIME_WRAPPER_INCLUDED__. This should not be relying
on declarations from the clang builtin headers. There is no issue declaring
the same intrinsics multiple times. This will enable removal of declarations
from the clang builtin headers.
* added graceful errors/exit in profile/analyze roofline.csv
* edit if statement truth
* restore if statement truth (roofline_csv needs at least 2 rows)
* addressed comments and skipped showing roof metrics when data invalid
* fix workload merge
* changed warning to error
* removed redundant variable definition
* added roofline csv validate check in TUI
* add test cases to test validation function
* ruff format
* simplified TUI roofline handling
* Update README documentation links for clarity and consistency across projects
- Changed links in the README files for `clr`, `hipother`, and `hip-tests` to use relative paths instead of absolute URLs, improving navigation within the repository.
* Update CONTRIBUTING documentation to use relative links for improved navigation
- Changed absolute URLs to relative paths in the CONTRIBUTING.md files for the hip and hipother projects, enhancing consistency and ease of access within the repository.
* Rock: hip-tests installation path remains same for linux and windows
On theRock - installation path remains same linux/windows
share/hip/catch_tests
On internal win build - hip-tests will be installed to catch_tests
flag is passed internally which controls the path.