Files
rocm-systems/projects
habajpai-amd 7e74d163fd [rocprof-sys] Fix RCCL comm_data counters in rocpd output (#2607)
## Motivation

<!-- Explain the purpose of this PR and the goals it aims to achieve. -->
The validate-rccl-* tests were failing because "RCCL Comm" counters were not being written to perfetto traces when using the new cached-perfetto approach.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->
Root Cause: The write_perfetto_counter_track() in rccl.cpp was only called when config::get_use_perfetto() returned true, which requires ROCPROFSYS_TRACE_LEGACY=ON. This meant RCCL counters weren't captured with the new trace cache approach.

Solution: Integrated RCCL with the trace cache system:

Changes to source/lib/rocprof-sys/library/rocprofiler-sdk/rccl.cpp:

- Added cache_rccl_comm_data_events<Track>() function to store RCCL comm data via pmc_event_with_sample with category::comm_data
- Modified tool_tracing_callback_rccl() to always cache events for new perfetto approach, while preserving legacy write_perfetto_counter_track() calls for backward compatibility

Changes to tests/rocprof-sys-testing.cmake:

- Added rccl_api to ROCPROFSYS_ROCM_DOMAINS to enable RCCL API callback tracing

Handler verification: The perfetto_processor_t already has a handler for ROCPROFSYS_CATEGORY_COMM_DATA in m_pmc_track_map that processes the cached events.
2026-01-22 15:38:19 -05:00
..
2025-12-15 11:57:18 -08:00