## Motivation
Enable UCX communication tracing and communication metadata
## Technical Details
Implement UCX API wrappers to trace transport-layer communication. This adds communication data tracking and exposes “UCX Comm Send/Recv” timelines, enabling detailed analysis of MPI, OpenSHMEM, and other UCX-based runtime communication patterns.
- Implements function interception for UCX functions across multiple categories using gotcha component.
- Extended comm_data component to track UCX send/recv operations - Added ucx_send and ucx_recv labels for Perfetto counter tracks. Integrated UCX data tracking with existing MPI/RCCL tracking infrastructure.
- Added ROCPROFSYS_USE_UCX configuration option (enabled by default).
- Created FindUCX.cmake module for UCX header detection. Falls back to internal UCX headers if system headers not found.
- Updated all Dockerfiles to include UCX dependencies.
* [SWDEV-559965] Update Changelog for amd-smi set --power-cap
Updated Changelog to mention flexible argument
ordering for power cap type in amdsmi power cap set.
Corrected Changelog documentation on PPT1 reset
power_cap command.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
## Motivation
- Added `check_rocminfo` function that returns true if the provided regex was found, false otherwise. Can also use `GET_OUTPUT` to get the raw output filtered with or without a regex.
- Moved `rocprofiler_systems_get_gfx_archs()` to `MacroUtilities.cmake`
- Added `rocprofiler_systems_lookup_gfx()`, which detects whether a given `gfx` is from the `instinct`, `radeon` or `apu` family.
- Added `ROCPROFSYS_GFX_TARGETS` as a build argument. Used to specify the offloading architectures that GPU examples should compile for. If empty, defaults to whatever your system has.
- GPU examples now check if the given `gfx` targets (from `ROCPROFSYS_GFX_TARGETS`) are supported.
- OMPVV offload tests now only compile if `amdflang` version is `>= 20`
- Improve link time by reducing the number of GFX targets that binaries need to support.
- RCCL is now passed a `GPU_TARGETS` var specifying the architectures to build/link against.
* Use TheRock nightly builds in testing container
* Add HIP_DEVICE_LIB_PATH env var for hipcc to work
* Add HIP_PLATFORM env var for cmake hip package
* Add tarball placeholder
* Add -f to curl command to fail on HTTP error
* SWDEV-549518 - Enable logging dynamically through HIP APIS.
* SWDEV-549518 - Adding ROCProfiler related new API changes.
* rocprofiler-sdk changes for hip api additions.
---------
Co-authored-by: Venkateshwar Reddy Kandula <venkateshwar.kandula1306@gmail.com>
Co-authored-by: jainprad <92369414+jainprad@users.noreply.github.com>
* Removed attach tool library path
* Support new attach/detach API
* New attach/detach API was introduced in
https://github.com/ROCm/rocm-systems/pull/1653
* Provide backward compatibility with old api
* Stabilize attach/detach tests by adding sleep to help workload get
ready for attachment
* Fix typo in test name
---------
Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
Co-authored-by: Fei Zheng <44449748+feizheng10@users.noreply.github.com>
* Analysis database v1.2.0
* `pc_sampling` and `roofline_data` tables should relate to `kernel` table instead of `workload` table
* Remove `kernel_name` fields in `pc_sampling` and `roofline_data` table
* Add kernel existence check for roofline data to prevent KeyError (#2536)
* Initial plan
* Add kernel existence check for roofline data to prevent KeyError
Co-authored-by: vedithal-amd <191402304+vedithal-amd@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vedithal-amd <191402304+vedithal-amd@users.noreply.github.com>
* Optimize analysis performance
* Refactor database schema: separate metric definitions from kernels
Reorganize the database ORM to decouple metric definitions from kernel
objects. This improves the schema design by:
- Rename Metric -> MetricDefinition and Value -> MetricValue for clarity
- Move metric definitions from kernel-level to workload-level, since
metric definitions are shared across kernels
- Update relationships: MetricDefinition belongs to Workload,
MetricValue
references both MetricDefinition and Kernel
- Refactor metric_view to join through the new schema structure
- Update test fixtures to use renamed table and class names
- Update documentation with new example output using nbody workload
- Regenerate database schema and views diagrams
* Add min amd max aggregation in kernel_view
* Add primary key id from tables into the view
---------
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vedithal-amd <191402304+vedithal-amd@users.noreply.github.com>
This reverts commit 7b00d3a89b.
The workaround is no longer needed - root cause fixed in:
- rocm-smi-lib (PR #2531): Made devInfoTypesStrings file-local static
- amdsmi (PR #2575): Added visibility("hidden") attribute
* Update readme general section and citation version and date.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Minor change to project title- changing now to not forget but we are waiti8ng on feedback about citation from r&d.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Edit citation from R&D feedback
---------
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Fix set/get access failure for VMM on windows
* seperate code paths for linux and windows to avoid using import/export calls in windows
---------
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
* Add put to all pes from all lanes concurrently
* Remove wg_init, use size_t for size params, 64bit data exchange (more
bits for verification masking)
* Rename to flood-test, add put,putnbi,p,get,getnbi,g variants, count time
correctly
* Add flood tester to the testing script
* add to gda test case w/o the _g variant that is not implemented.
[ROCm/rocshmem commit: cca7872bcf]