Graf commitů

74754 Commity

Autor SHA1 Zpráva Datum
Mythreya Kuricheti 73df3f12b3 use message instead of warning for nccl.h C++ check (#2128)
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: 0dc31b1a4a]
2026-01-20 14:21:38 -07:00
Mythreya Kuricheti 0dc31b1a4a use message instead of warning for nccl.h C++ check (#2128)
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>
2026-01-20 14:21:38 -07:00
Kian Cossettini 7c9361190b [rocprofiler-systems] Fix MPI recv_data calculation (#2694)
Fix incorrect `mpi_recv` calculation. It was using `_send_size` instead of `_recv_size` for `mpi_recv`.
2026-01-20 16:17:22 -05:00
Allen Hubbe 3edd56ca23 gda ionic: ccqe cleanup and error check (#389)
Delete unreachable ccqe polling path, ionic_poll_wave_ccqe().
Move cqe error check to ionic_quiet_internal_ccqe().

Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>

[ROCm/rocshmem commit: 6b00964f32]
2026-01-20 15:26:53 -05:00
Allen Hubbe 6b00964f32 gda ionic: ccqe cleanup and error check (#389)
Delete unreachable ccqe polling path, ionic_poll_wave_ccqe().
Move cqe error check to ionic_quiet_internal_ccqe().

Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
2026-01-20 15:26:53 -05:00
Nusrat Islam 96f6029a1b revert memcpy use for direct AG (#2146)
Co-authored-by: Islam <nusislam@amd.com>

[ROCm/rccl commit: f3c5156bbf]
2026-01-20 13:58:28 -06:00
Nusrat Islam f3c5156bbf revert memcpy use for direct AG (#2146)
Co-authored-by: Islam <nusislam@amd.com>
2026-01-20 13:58:28 -06:00
German Andryeyev db792fac37 SWDEV-558849 - Add support for static linking with ROCR (#2659) 2026-01-20 14:53:01 -05:00
mberenjk 9ee8fb0aa9 Merge pull request #2136 from mberenjk/mberenjk/nccl-sync-2.28.3
Merge remote-tracking branch 'nccl/master' 2.28.3 into develop

[ROCm/rccl commit: 2fdcceaabb]
2026-01-20 11:38:11 -08:00
mberenjk 2fdcceaabb Merge pull request #2136 from mberenjk/mberenjk/nccl-sync-2.28.3
Merge remote-tracking branch 'nccl/master' 2.28.3 into develop
2026-01-20 11:38:11 -08:00
Alysa Liu 9139f5a241 Revert "rocr: Switch back to legacy IPC (#1744)" (#2676)
This reverts commit 7e4b62290c.
2026-01-20 14:34:10 -05:00
Marzieh Berenjkoub d7293281f3 Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: 858b4e76eb]
2026-01-20 13:04:02 -06:00
Marzieh Berenjkoub 858b4e76eb Merge remote-tracking branch 'nccl/master' into develop 2026-01-20 13:04:02 -06:00
Ioannis Assiouras 59aa56a340 hip-issue-3876 : Take into account thread-local capture mode in checks for valid capture (#2177) 2026-01-20 18:42:27 +00:00
Sajina PK 15c82d6da8 [rocprofiler-system]: Enable UCX Communication API tracing (#2306)
## Motivation

Enable UCX communication tracing and communication metadata 

## Technical Details

Implement UCX API wrappers to trace transport-layer communication. This adds communication data tracking and exposes “UCX Comm Send/Recv” timelines, enabling detailed analysis of MPI, OpenSHMEM, and other UCX-based runtime communication patterns.

- Implements function interception for UCX functions across multiple categories using gotcha component.
- Extended comm_data component to track UCX send/recv operations - Added ucx_send and ucx_recv labels for Perfetto counter tracks. Integrated UCX data tracking with existing MPI/RCCL tracking infrastructure.
- Added ROCPROFSYS_USE_UCX configuration option (enabled by default).
- Created FindUCX.cmake module for UCX header detection. Falls back to internal UCX headers if system headers not found.
- Updated all Dockerfiles  to include UCX dependencies.
2026-01-20 13:16:43 -05:00
Bindhiya Kanangot Balakrishnan 72f0a41658 [SWDEV-559965] Update Changelog for power cap type (#2647)
* [SWDEV-559965] Update Changelog for amd-smi set --power-cap

Updated Changelog to mention flexible argument
ordering for power cap type in amdsmi power cap set.
Corrected Changelog documentation on PPT1 reset
power_cap command.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2026-01-20 11:28:09 -06:00
Rakesh Roy 5049efdd75 Reset HIP_VERSION_PATCH to 0 (#2590) 2026-01-20 22:54:20 +05:30
Kian Cossettini 698ac6b8bc [rocprofiler-systems] Add build option for "examples" to specify gfx-arch (#2626)
## Motivation
 - Added `check_rocminfo` function that returns true if the provided regex was found, false otherwise. Can also use `GET_OUTPUT` to get the raw output filtered with or without a regex.
 - Moved `rocprofiler_systems_get_gfx_archs()` to `MacroUtilities.cmake` 
 - Added `rocprofiler_systems_lookup_gfx()`, which detects whether a given `gfx` is from the `instinct`, `radeon` or `apu` family.
 - Added `ROCPROFSYS_GFX_TARGETS` as a build argument. Used to specify the offloading architectures that GPU examples should compile for. If empty, defaults to whatever your system has.
 - GPU examples now check if the given `gfx` targets (from `ROCPROFSYS_GFX_TARGETS`) are supported.
 - OMPVV offload tests now only compile if `amdflang` version is `>= 20`
 - Improve link time by reducing the number of GFX targets that binaries need to support.
   - RCCL is now passed a `GPU_TARGETS` var specifying the architectures to build/link against.
2026-01-20 12:13:21 -05:00
German Andryeyev 3af2bf4952 Merge branch 'develop' into amd/dev/gandryey/SWDEV-558849 2026-01-20 12:04:53 -05:00
vedithal-amd 4a5cbbfba5 [rocprofiler-compute] Fix kernel/dispatch filtering (#2479)
* Fix kernel/dispatch fitlering in GUI

* Disallow --kernel and --dispatch filtering in analyze --gui mode since
  GUI frontend offers dropdown menu for kernel and dispatch filtering
    * Update CHANGELOG and documentation

* Gracefully handle N/A values

* Ensure workload path is valid before using it in GUI

* Ignore kernel filters if dispatch filters provided

* Add documentation for dispatch filtering overriding kernel filtering

* Fix typo

* Fix documentation

* remove unnecessary whitespace

* Address review comments

* Allow kernel/dispatch filtering with --gui

* Address review comments

* Address review comments

* Update CHANGELOG

* Fix formatting
2026-01-20 10:02:31 -05:00
vedithal-amd a926660670 [rocprofiler-compute] Use TheRock nightly builds in testing container (#2661)
* Use TheRock nightly builds in testing container

* Add HIP_DEVICE_LIB_PATH env var for hipcc to work

* Add HIP_PLATFORM env var for cmake hip package

* Add tarball placeholder

* Add -f to curl command to fail on HTTP error
2026-01-20 09:54:38 -05:00
Edgar Gabriel 55e2b501d3 replace memset with hipMemset (#390)
[ROCm/rocshmem commit: bc70ce551c]
2026-01-20 08:14:25 -06:00
Edgar Gabriel bc70ce551c replace memset with hipMemset (#390) 2026-01-20 08:14:25 -06:00
marantic-amd 51f49d8835 Add notice for the newly deprecated env variables (#2690) 2026-01-20 13:59:31 +01:00
Milan Radosavljevic b533f56197 Add automatic PyTorch library discovery for Python applications (#2623)
* Add automatic PyTorch library discovery for Python applications (#2623)
2026-01-20 08:42:49 +01:00
David Galiffi c83b3aae07 Fix Python Formatting (#2679)
Updated version of black to 26.1.0 updated some formatting rules

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2026-01-19 21:26:50 -05:00
jamessiddeley-amd 25090e003f [rocprof-compute] Pin ruff version for consistent formatting (#2680)
* pin ruff versions each to current latest

* Update rocprofiler-compute-formatting.yml

* Downgrade .pre-commit-config.yaml to match develop
2026-01-19 19:10:02 -05:00
Karthik Jayaprakash 99c3a06f4e SWDEV-549518 - Enable logging dynamically through HIP APIS. (#1079)
* SWDEV-549518 - Enable logging dynamically through HIP APIS.

* SWDEV-549518 - Adding ROCProfiler related new API changes.

* rocprofiler-sdk changes for hip api additions.

---------

Co-authored-by: Venkateshwar Reddy Kandula <venkateshwar.kandula1306@gmail.com>
Co-authored-by: jainprad <92369414+jainprad@users.noreply.github.com>
2026-01-19 16:16:14 -05:00
marandje 9f37cd6309 SWDEV-1 - Fix hipMemPoolTrimTo failing tests (#2628) 2026-01-19 21:10:15 +00:00
abchoudh-amd dd149d3957 [rocprofiler-compute] Support new attach/detach API (#2642)
* Removed attach tool library path

* Support new attach/detach API

* New attach/detach API was introduced in
  https://github.com/ROCm/rocm-systems/pull/1653

* Provide backward compatibility with old api

* Stabilize attach/detach tests by adding sleep to help workload get
  ready for attachment

* Fix typo in test name

---------

Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
Co-authored-by: Fei Zheng <44449748+feizheng10@users.noreply.github.com>
2026-01-19 16:00:14 -05:00
SakaSitharammurthy 1c5aa2d4e7 [SWDEV-567099] Updated 'amdsmi list --cpu all' command (#2519)
Signed-off-by: Saka, Sitharam Murthy <SitharamMurthy.Saka@amd.com>
2026-01-19 14:56:59 -06:00
vedithal-amd 0254181f42 [rocprofiler-compute] Analysis Database Schema Improvements (v1.2.0) (#2526)
* Analysis database v1.2.0

* `pc_sampling` and `roofline_data` tables should relate to `kernel` table instead of `workload` table

* Remove `kernel_name` fields in `pc_sampling` and `roofline_data` table

* Add kernel existence check for roofline data to prevent KeyError (#2536)

* Initial plan

* Add kernel existence check for roofline data to prevent KeyError

Co-authored-by: vedithal-amd <191402304+vedithal-amd@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vedithal-amd <191402304+vedithal-amd@users.noreply.github.com>

* Optimize analysis performance

* Refactor database schema: separate metric definitions from kernels

Reorganize the database ORM to decouple metric definitions from kernel
objects. This improves the schema design by:

- Rename Metric -> MetricDefinition and Value -> MetricValue for clarity
- Move metric definitions from kernel-level to workload-level, since
  metric definitions are shared across kernels
- Update relationships: MetricDefinition belongs to Workload,
  MetricValue
  references both MetricDefinition and Kernel
- Refactor metric_view to join through the new schema structure
- Update test fixtures to use renamed table and class names
- Update documentation with new example output using nbody workload
- Regenerate database schema and views diagrams

* Add min amd max aggregation in kernel_view

* Add primary key id from tables into the view

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vedithal-amd <191402304+vedithal-amd@users.noreply.github.com>
2026-01-19 15:25:43 -05:00
systems-assistant[bot] 88f07baa92 SWDEV-493792 - add split barriers for grid_group (#508)
* SWDEV-493792 - add split barriers for grid_group

* add tests

* Update change log

* Add Navi4 split barrier

* Update docs

* Use new Catch2 Approx macro

* Update split_barrier.cc to check for coop groups

---------

Co-authored-by: Jatin Chaudhary <jatchaud@amd.com>
Co-authored-by: Jatin Chaudhary <51944368+cjatin@users.noreply.github.com>
2026-01-19 09:17:00 -08:00
lloginov-amd e49b501e9a Add scratch memory support (#2211) 2026-01-19 16:24:30 +01:00
Gopesh Bhardwaj 1ac805cb35 [rocprofiler-sdk][Documentation] Updating CHANGELOG for 7.2 (#2573)
* Updating CHANGELOG for 7.2

* Updated CHANGELOG

* Addressed feedback

* Addressed Feedback

* Updated based on review comments

* Update installation steps and documentation links

Updated installation documentation and links to latest repository.

* Addressed Feedback

* Updated CHANGELOG

* Addressed feedback

* updated CHANGELOG

* Apply suggestion from @prbasyal-amd

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

---------

Co-authored-by: Swati Rawat <120587655+SwRaw@users.noreply.github.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
2026-01-17 14:55:55 +05:30
Aravind Ravikumar f336ad5133 Enable Robust Multi-Node RCCL Testing with cvs-sbatch and Improve CI Reliability (#2123)
* sbatch changes and TheRock SHA update

* Move tests location from /home to /apps/cvs_tests

* Add comments and move credential.ini file to /apps/cvs_tests

* Changed salloc reservation to rccl reservation

---------

Co-authored-by: Aravind Ravikumar <arravikum@amd.com>

[ROCm/rccl commit: 239d62f545]
2026-01-16 23:13:06 -05:00
Aravind Ravikumar 239d62f545 Enable Robust Multi-Node RCCL Testing with cvs-sbatch and Improve CI Reliability (#2123)
* sbatch changes and TheRock SHA update

* Move tests location from /home to /apps/cvs_tests

* Add comments and move credential.ini file to /apps/cvs_tests

* Changed salloc reservation to rccl reservation

---------

Co-authored-by: Aravind Ravikumar <arravikum@amd.com>
2026-01-16 23:13:06 -05:00
habajpai-amd b53c99669c Revert "fix: prevent double-free crash during process exit in amd-smi (#2213)" (#2640)
This reverts commit 7b00d3a89b.

The workaround is no longer needed - root cause fixed in:
- rocm-smi-lib (PR #2531): Made devInfoTypesStrings file-local static
- amdsmi (PR #2575): Added visibility("hidden") attribute
2026-01-16 16:08:52 -05:00
Rahul Vaidya 62dab32433 Update AlltoAll and AlltoAllv API for 2.28.3 compatibility (#161)
Signed-off-by: ravaidya <ravaidya@amd.com>

[ROCm/rccl-tests commit: a52452e891]
2026-01-16 11:28:40 -08:00
Rahul Vaidya a52452e891 Update AlltoAll and AlltoAllv API for 2.28.3 compatibility (#161)
Signed-off-by: ravaidya <ravaidya@amd.com>
2026-01-16 11:28:40 -08:00
cfallows-amd 005f07004d [rocprofiler-compute] Update README (#2589)
* Update readme general section and citation version and date.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Minor change to project title- changing now to not forget but we are waiti8ng on feedback about citation from r&d.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Edit citation from R&D feedback

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
2026-01-16 12:48:19 -05:00
Rahul Manocha 249a947dc7 Fix set/get access failure for VMM on windows (#2280)
* Fix set/get access failure for VMM on windows

* seperate code paths for linux and windows to avoid using import/export calls in windows

---------

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2026-01-16 08:34:21 -08:00
akolliasAMD 2606c13155 Tests package (#384)
* added packaging for the tests and for the driver.sh

* making .sh files into programs so they keep permissions

[ROCm/rocshmem commit: e7269cb925]
2026-01-16 09:10:36 -07:00
akolliasAMD e7269cb925 Tests package (#384)
* added packaging for the tests and for the driver.sh

* making .sh files into programs so they keep permissions
2026-01-16 09:10:36 -07:00
German Andryeyev 07a6b45535 rocr: restore the original line 2026-01-16 11:05:24 -05:00
Aurelien Bouteiller ede2adfe49 new tester: put to all pes from all lanes concurrently (#112)
* Add put to all pes from all lanes concurrently

* Remove wg_init, use size_t for size params, 64bit data exchange (more
bits for verification masking)

* Rename to flood-test, add put,putnbi,p,get,getnbi,g variants, count time
correctly

* Add flood tester to the testing script

* add to gda test case w/o the _g variant that is not implemented.

[ROCm/rocshmem commit: cca7872bcf]
2026-01-16 10:40:48 -05:00
Aurelien Bouteiller cca7872bcf new tester: put to all pes from all lanes concurrently (#112)
* Add put to all pes from all lanes concurrently

* Remove wg_init, use size_t for size params, 64bit data exchange (more
bits for verification masking)

* Rename to flood-test, add put,putnbi,p,get,getnbi,g variants, count time
correctly

* Add flood tester to the testing script

* add to gda test case w/o the _g variant that is not implemented.
2026-01-16 10:40:48 -05:00
vedithal-amd f64d8e0f43 [rocprofiler-compute] Improve native tool discovery and partition detection (#2630)
* Improve native tool discovery and partition detection

- Enhanced native tool path resolution to support CMAKE_INSTALL_LIBDIR variations
  (lib, lib64, lib32, etc.) using glob pattern matching
- Extracted path variables to avoid duplication in error messages
- Improved error message clarity by showing exact paths searched for .so and .cpp files
- Simplified code path construction using consistent Path.resolve().parents[x] syntax

- Fixed redundant partition warnings on pre-MI300 GPUs by adding architecture check
- Only query compute/memory partition on MI300+ series (gfx940+)
- Added proper type hints for gpu_arch parameter
- Moved gpu_info extraction after soc_info to ensure gpu_arch is available
- Improved code comments for MI300 series threshold

* Handle gpu arch like a hex string
2026-01-16 10:36:19 -05:00
Fábio Mestre e6236417f7 SWDEV-571222 - Fix bf16 headers on gcc (#2260)
GCC does not support anonymous structs with members that have non-trivial constructors. This commit changes the header to remove the union when compiling with gcc. This should be a non-breaking change for other compilers.
2026-01-16 15:02:48 +00:00
Edgar Gabriel 3ce10dc688 fix allreduce tester (#385)
- use the reduce_psync buffers for synchronization in allreduce, not the
  barrier_psync.
- execute a wwg barrier after the allreduce operation. After internal
  discussion it was determined that it is required for correctness.

[ROCm/rocshmem commit: 6f512e92a5]
2026-01-16 08:10:25 -06:00