## Motivation
Enable UCX communication tracing and communication metadata
## Technical Details
Implement UCX API wrappers to trace transport-layer communication. This adds communication data tracking and exposes “UCX Comm Send/Recv” timelines, enabling detailed analysis of MPI, OpenSHMEM, and other UCX-based runtime communication patterns.
- Implements function interception for UCX functions across multiple categories using gotcha component.
- Extended comm_data component to track UCX send/recv operations - Added ucx_send and ucx_recv labels for Perfetto counter tracks. Integrated UCX data tracking with existing MPI/RCCL tracking infrastructure.
- Added ROCPROFSYS_USE_UCX configuration option (enabled by default).
- Created FindUCX.cmake module for UCX header detection. Falls back to internal UCX headers if system headers not found.
- Updated all Dockerfiles to include UCX dependencies.
* [SWDEV-559965] Update Changelog for amd-smi set --power-cap
Updated Changelog to mention flexible argument
ordering for power cap type in amdsmi power cap set.
Corrected Changelog documentation on PPT1 reset
power_cap command.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
## Motivation
- Added `check_rocminfo` function that returns true if the provided regex was found, false otherwise. Can also use `GET_OUTPUT` to get the raw output filtered with or without a regex.
- Moved `rocprofiler_systems_get_gfx_archs()` to `MacroUtilities.cmake`
- Added `rocprofiler_systems_lookup_gfx()`, which detects whether a given `gfx` is from the `instinct`, `radeon` or `apu` family.
- Added `ROCPROFSYS_GFX_TARGETS` as a build argument. Used to specify the offloading architectures that GPU examples should compile for. If empty, defaults to whatever your system has.
- GPU examples now check if the given `gfx` targets (from `ROCPROFSYS_GFX_TARGETS`) are supported.
- OMPVV offload tests now only compile if `amdflang` version is `>= 20`
- Improve link time by reducing the number of GFX targets that binaries need to support.
- RCCL is now passed a `GPU_TARGETS` var specifying the architectures to build/link against.
* Use TheRock nightly builds in testing container
* Add HIP_DEVICE_LIB_PATH env var for hipcc to work
* Add HIP_PLATFORM env var for cmake hip package
* Add tarball placeholder
* Add -f to curl command to fail on HTTP error
* SWDEV-549518 - Enable logging dynamically through HIP APIS.
* SWDEV-549518 - Adding ROCProfiler related new API changes.
* rocprofiler-sdk changes for hip api additions.
---------
Co-authored-by: Venkateshwar Reddy Kandula <venkateshwar.kandula1306@gmail.com>
Co-authored-by: jainprad <92369414+jainprad@users.noreply.github.com>
* Removed attach tool library path
* Support new attach/detach API
* New attach/detach API was introduced in
https://github.com/ROCm/rocm-systems/pull/1653
* Provide backward compatibility with old api
* Stabilize attach/detach tests by adding sleep to help workload get
ready for attachment
* Fix typo in test name
---------
Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
Co-authored-by: Fei Zheng <44449748+feizheng10@users.noreply.github.com>
* Analysis database v1.2.0
* `pc_sampling` and `roofline_data` tables should relate to `kernel` table instead of `workload` table
* Remove `kernel_name` fields in `pc_sampling` and `roofline_data` table
* Add kernel existence check for roofline data to prevent KeyError (#2536)
* Initial plan
* Add kernel existence check for roofline data to prevent KeyError
Co-authored-by: vedithal-amd <191402304+vedithal-amd@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vedithal-amd <191402304+vedithal-amd@users.noreply.github.com>
* Optimize analysis performance
* Refactor database schema: separate metric definitions from kernels
Reorganize the database ORM to decouple metric definitions from kernel
objects. This improves the schema design by:
- Rename Metric -> MetricDefinition and Value -> MetricValue for clarity
- Move metric definitions from kernel-level to workload-level, since
metric definitions are shared across kernels
- Update relationships: MetricDefinition belongs to Workload,
MetricValue
references both MetricDefinition and Kernel
- Refactor metric_view to join through the new schema structure
- Update test fixtures to use renamed table and class names
- Update documentation with new example output using nbody workload
- Regenerate database schema and views diagrams
* Add min amd max aggregation in kernel_view
* Add primary key id from tables into the view
---------
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vedithal-amd <191402304+vedithal-amd@users.noreply.github.com>
This reverts commit 7b00d3a89b.
The workaround is no longer needed - root cause fixed in:
- rocm-smi-lib (PR #2531): Made devInfoTypesStrings file-local static
- amdsmi (PR #2575): Added visibility("hidden") attribute
* Update readme general section and citation version and date.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Minor change to project title- changing now to not forget but we are waiti8ng on feedback about citation from r&d.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Edit citation from R&D feedback
---------
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Fix set/get access failure for VMM on windows
* seperate code paths for linux and windows to avoid using import/export calls in windows
---------
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
* Add put to all pes from all lanes concurrently
* Remove wg_init, use size_t for size params, 64bit data exchange (more
bits for verification masking)
* Rename to flood-test, add put,putnbi,p,get,getnbi,g variants, count time
correctly
* Add flood tester to the testing script
* add to gda test case w/o the _g variant that is not implemented.
[ROCm/rocshmem commit: cca7872bcf]
* Add put to all pes from all lanes concurrently
* Remove wg_init, use size_t for size params, 64bit data exchange (more
bits for verification masking)
* Rename to flood-test, add put,putnbi,p,get,getnbi,g variants, count time
correctly
* Add flood tester to the testing script
* add to gda test case w/o the _g variant that is not implemented.
* Improve native tool discovery and partition detection
- Enhanced native tool path resolution to support CMAKE_INSTALL_LIBDIR variations
(lib, lib64, lib32, etc.) using glob pattern matching
- Extracted path variables to avoid duplication in error messages
- Improved error message clarity by showing exact paths searched for .so and .cpp files
- Simplified code path construction using consistent Path.resolve().parents[x] syntax
- Fixed redundant partition warnings on pre-MI300 GPUs by adding architecture check
- Only query compute/memory partition on MI300+ series (gfx940+)
- Added proper type hints for gpu_arch parameter
- Moved gpu_info extraction after soc_info to ensure gpu_arch is available
- Improved code comments for MI300 series threshold
* Handle gpu arch like a hex string
GCC does not support anonymous structs with members that have non-trivial constructors. This commit changes the header to remove the union when compiling with gcc. This should be a non-breaking change for other compilers.
- use the reduce_psync buffers for synchronization in allreduce, not the
barrier_psync.
- execute a wwg barrier after the allreduce operation. After internal
discussion it was determined that it is required for correctness.
[ROCm/rocshmem commit: 6f512e92a5]
- use the reduce_psync buffers for synchronization in allreduce, not the
barrier_psync.
- execute a wwg barrier after the allreduce operation. After internal
discussion it was determined that it is required for correctness.
Tests were relying on floats for calculating ulp values when validating the output. This is not correct given that the calculations are done using Float16. The fix is to update the test framework to use fp16 ulp instead.
* byteswap<T> returns by value
* replace hand-rolled implementations with Clang __builtin_bswap<N> intrinsics
* new high-level interface endian::to_be, endian::from_be, etc. to indicate conversion direction
[ROCm/rocshmem commit: cf8b72a047]
* byteswap<T> returns by value
* replace hand-rolled implementations with Clang __builtin_bswap<N> intrinsics
* new high-level interface endian::to_be, endian::from_be, etc. to indicate conversion direction
* attach: Formalize ROCAttach API
- Make ROCAttach public with public headers
- Change detach to take a PID
- attach and detach are now reentrant
- Cleanup of states and signal handling in ptrace session
- Fixes mixed up definition of ROCPROF_ATTACH_TOOL_LIBRARY
- ROCPROF_ATTACH_TOOL_LIBRARY now always means the tool library loaded by the attachment target
- ROCPROF_ATTACH_LIBRARY refers to the library used to perform attachment
- Add direct call of rocprof-attach
- Fix python library call of rocprof-attach
- Function now named attach(), changed from main()
* attach: rocprof-compute ROCAttach updates
- Update to new library names
- Correct usage of C lib detach
* attach: add test for rocattach
- Disable ASan, TSan, and UBSan for the new parallel-attach test
- Lower log level for LSan tests, existing behavior from other tests
---------
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Motivation
We wish to avoid triggering full Jenkins runs for docs-only PRs, as this takes up testing resources and slows development time. rocm_ci_caller.yml already excludes some docs-only changes, but this can be improved to exclude them along more paths.
Technical Details
The checks that rocm_ci_caller.yml uses to determine if a changed file in a PR is worth a Jenkins run has been increased to exclude more paths and more file suffixes.
JIRA ID
AIROCDOC-78, AIROCDOC-424
Test Plan
Created a test branch users/dsclear/shorten_workflows_test_root with the changes in this PR, branched from develop.
Branched users/dsclear/shorten_workflows_test_bin_3 and users/dsclear/shorten_workflows_test_text_3 from users/dsclear/shorten_workflows_test_root.
Modified users/dsclear/shorten_workflows_test_bin_3 to add two .h files, and submitted a PR into users/dsclear/shorten_workflows_test_root (Test PR, do not merge. Test PR to test Jenkins CI/CD modifications. #2613).
Modified users/dsclear/shorten_workflows_test_text_3 to add a new .txt file, and submitted a PR into users/dsclear/shorten_workflows_test_root (Test PR, do not merge. Test PR to test Jenkins CI/CD modifications (docs only). #2614).
Test Result
The test PR in step 3 caused rocm_ci_caller.yml to attempt to trigger Jenkins, as this is a 'non-docs' change.
The test PR in step 4 had the attempt to trigger Jenkins skipped, as this is a 'docs-only' change.
* Fix the amdgpu version string comparison
The intention behind it was to avoid showing the string if it's not
got information.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
* Display the kernel version in amd-smi output
This is an interesting debugging point, especially in the case of
not having a DKMS package installed.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Moving os_kernel_version to static --driver
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
---------
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>