## Motivation <!-- Explain the purpose of this PR and the goals it aims to achieve. --> The validate-rccl-* tests were failing because "RCCL Comm" counters were not being written to perfetto traces when using the new cached-perfetto approach. ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> Root Cause: The write_perfetto_counter_track() in rccl.cpp was only called when config::get_use_perfetto() returned true, which requires ROCPROFSYS_TRACE_LEGACY=ON. This meant RCCL counters weren't captured with the new trace cache approach. Solution: Integrated RCCL with the trace cache system: Changes to source/lib/rocprof-sys/library/rocprofiler-sdk/rccl.cpp: - Added cache_rccl_comm_data_events<Track>() function to store RCCL comm data via pmc_event_with_sample with category::comm_data - Modified tool_tracing_callback_rccl() to always cache events for new perfetto approach, while preserving legacy write_perfetto_counter_track() calls for backward compatibility Changes to tests/rocprof-sys-testing.cmake: - Added rccl_api to ROCPROFSYS_ROCM_DOMAINS to enable RCCL API callback tracing Handler verification: The perfetto_processor_t already has a handler for ROCPROFSYS_CATEGORY_COMM_DATA in m_pmc_track_map that processes the cached events.
ROCm Systems
Welcome to the ROCm Systems super-repo. This repository consolidates multiple ROCm systems projects into a single repository to streamline development, CI, and integration. The first set of projects focuses on requirements for building PyTorch.
Super-repo Status and CI Health
This table provides the current status of the migration of specific ROCm systems projects as well as a pointer to their current CI health.
Key:
- Completed: Fully migrated and integrated. This super-repo should be considered the source of truth for this project. The old repo may still be used for release activities.
- In Progress: Ongoing migration, tests, or integration. Please refrain from submitting new pull requests on the individual repo of the project, and develop on the super-repo.
- Pending: Not yet started or in the early planning stages. The individual repo should be considered the source of truth for this project.
Tentative migration schedule
| Component | Tentative Date |
|---|
*Remaining schedule to be determined.
TheRock CI Status
Note TheRock CI performs multi-component testing on top of builds leveraging TheRock build system.
Nomenclature
Project names have been standardized to match the casing and punctuation of released packages. This removes inconsistent camel-casing and underscores used in legacy repositories.
Structure
The repository is organized as follows:
projects/
amdsmi/
aqlprofile/
clr/
hip/
hipother/
hip-tests/
rccl/
rdc/
rocm-core
rocminfo/
rocmsmilib/
rocprofiler/
rocprofiler-compute/
rocprofiler-register/
rocprofiler-sdk/
rocprofiler-systems/
rocrruntime/
rocshmem/
roctracer/
- Each folder under
projects/corresponds to a ROCm systems project that was previously maintained in a standalone GitHub repository and released as distinct packages. - Each folder under
shared/contains code that existed in its own repository and is used as a dependency by multiple projects, but does not produce its own distinct packages in previous ROCm releases.
Goals
- Enable unified build and test workflows across ROCm libraries.
- Facilitate shared tooling, CI, and contributor experience.
- Improve integration, visibility, and collaboration across ROCm library teams.
Getting Started
To begin contributing or building, see the CONTRIBUTING.md guide. It includes setup instructions, sparse-checkout configuration, development workflow, and pull request guidelines.
License
This super-repo contains multiple subprojects, each of which retains the license under which it was originally published.
📁 Refer to the LICENSE, LICENSE.md, or LICENSE.txt file within each projects/ or shared/ directory for specific license terms.
📄 Refer to the header notice in individual files outside projects/ or shared/ folders for their specific license terms.
Note
: The root of this repository does not define a unified license across all components.
Questions or Feedback?
We're happy to help!