Граф коммитов

71617 Коммитов

Автор SHA1 Сообщение Дата
Ameya Keshava Mallya 12ab8df3bc Add 'projects/rocshmem/' from commit '0496586829058af5cfd7f23acda2a6d0040da584'
git-subtree-dir: projects/rocshmem
git-subtree-mainline: 5fd976da70
git-subtree-split: 0496586829
2026-01-21 20:25:37 +00:00
vedithal-amd 5fd976da70 Fix typo in Bypass Req metric in 17.3 section for MI350 (#2704) 2026-01-21 15:00:23 -05:00
Tao Sang 163e44d0a8 SWDEV-555889 - Support mipmap on rocr (#2082)
* SWDEV-555889 - Support mipmap on rocr

Support mipmap in hip-rt on rocr backend.
Enable all mipmap tests in Windows.
Some other minor improvement.

Add some SRD logs that will be removed finally.

* Add sampler.mipFilter to fix sampler issues on mipmap in rocr.
Fix format issues of view of leveled image and  mipmap image in blit kernel in rocr.
Enabled disabled mipmap tests.

* Rewrite view logic

* Set word4.f.PITCH = 0 for mipmap SRD on navi31 to fix unstable test issues.
Reset last error in nagative tests.

* Remove SRD dump log from hip-rt
Let Rocr mipmap log be in condition.

* minor format chang

* Exclude mipmap tests for mi200+ which don't support mipmap.
2026-01-21 09:10:29 -08:00
Sam Ruscica 5daeb14582 SWDEV-547291 - Interop for OpenGL (#2350)
Updated to convert flags correctly

Added ObjectRegistry to track registered and mapped resources and incorporated it into hip_gl.

Added mip level check

Made functions static in-line

Reworked validation to be more clear.
2026-01-21 09:08:55 -08:00
Gopesh Bhardwaj c563286f96 Update changelog for ROCprofiler-SDK 1.1.0 (#2717)
using only arch name
2026-01-21 20:15:39 +05:30
Kian Cossettini 28b2ade7d2 Update mentions of OpenMP to reflect newer implementation (#2701)
Update timemory examples in docs to use the `rocprofiler-sdk` API.
2026-01-21 07:18:51 -05:00
Jatin Chaudhary 0590a72d4b Rework clock based unit tests (#2646) 2026-01-21 10:55:33 +00:00
hongkzha-amd d94185c5b2 rocrtst: set HSA_ENABLE_INTERRUPT after TestExample creation (#2687)
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Co-authored-by: cfreeamd <166262151+cfreeamd@users.noreply.github.com>
2026-01-21 10:39:50 +08:00
Karthik Jayaprakash 6a84a00208 Use size_t datatype for global dimensions. (#2604) 2026-01-20 20:39:07 -05:00
JeniferC99 50e00d1b94 Update CODEOWNERS (#2705): add /project/amdsmi owner 2026-01-20 15:49:59 -08:00
yugang-amd 05a6d017c6 [ROCmInfo] docs: mono-repo changes and style edits (#2584)
* initial edits

* mono repo related updates

* standardize component name

* style edits

* more edits
2026-01-20 18:06:54 -05:00
Yiltan 0496586829 [Docs] Clarify ROCSHMEM_HEAP_SIZE (#392)
* clarify ROCSHMEM_HEAP_SIZE

* Apply suggestions from code review

Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>

---------

Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
2026-01-20 17:22:18 -05:00
Kian Cossettini 7c9361190b [rocprofiler-systems] Fix MPI recv_data calculation (#2694)
Fix incorrect `mpi_recv` calculation. It was using `_send_size` instead of `_recv_size` for `mpi_recv`.
2026-01-20 16:17:22 -05:00
Allen Hubbe 6b00964f32 gda ionic: ccqe cleanup and error check (#389)
Delete unreachable ccqe polling path, ionic_poll_wave_ccqe().
Move cqe error check to ionic_quiet_internal_ccqe().

Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
2026-01-20 15:26:53 -05:00
German Andryeyev db792fac37 SWDEV-558849 - Add support for static linking with ROCR (#2659) 2026-01-20 14:53:01 -05:00
Alysa Liu 9139f5a241 Revert "rocr: Switch back to legacy IPC (#1744)" (#2676)
This reverts commit 7e4b62290c.
2026-01-20 14:34:10 -05:00
Ioannis Assiouras 59aa56a340 hip-issue-3876 : Take into account thread-local capture mode in checks for valid capture (#2177) 2026-01-20 18:42:27 +00:00
Sajina PK 15c82d6da8 [rocprofiler-system]: Enable UCX Communication API tracing (#2306)
## Motivation

Enable UCX communication tracing and communication metadata 

## Technical Details

Implement UCX API wrappers to trace transport-layer communication. This adds communication data tracking and exposes “UCX Comm Send/Recv” timelines, enabling detailed analysis of MPI, OpenSHMEM, and other UCX-based runtime communication patterns.

- Implements function interception for UCX functions across multiple categories using gotcha component.
- Extended comm_data component to track UCX send/recv operations - Added ucx_send and ucx_recv labels for Perfetto counter tracks. Integrated UCX data tracking with existing MPI/RCCL tracking infrastructure.
- Added ROCPROFSYS_USE_UCX configuration option (enabled by default).
- Created FindUCX.cmake module for UCX header detection. Falls back to internal UCX headers if system headers not found.
- Updated all Dockerfiles  to include UCX dependencies.
2026-01-20 13:16:43 -05:00
Bindhiya Kanangot Balakrishnan 72f0a41658 [SWDEV-559965] Update Changelog for power cap type (#2647)
* [SWDEV-559965] Update Changelog for amd-smi set --power-cap

Updated Changelog to mention flexible argument
ordering for power cap type in amdsmi power cap set.
Corrected Changelog documentation on PPT1 reset
power_cap command.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2026-01-20 11:28:09 -06:00
Rakesh Roy 5049efdd75 Reset HIP_VERSION_PATCH to 0 (#2590) 2026-01-20 22:54:20 +05:30
Kian Cossettini 698ac6b8bc [rocprofiler-systems] Add build option for "examples" to specify gfx-arch (#2626)
## Motivation
 - Added `check_rocminfo` function that returns true if the provided regex was found, false otherwise. Can also use `GET_OUTPUT` to get the raw output filtered with or without a regex.
 - Moved `rocprofiler_systems_get_gfx_archs()` to `MacroUtilities.cmake` 
 - Added `rocprofiler_systems_lookup_gfx()`, which detects whether a given `gfx` is from the `instinct`, `radeon` or `apu` family.
 - Added `ROCPROFSYS_GFX_TARGETS` as a build argument. Used to specify the offloading architectures that GPU examples should compile for. If empty, defaults to whatever your system has.
 - GPU examples now check if the given `gfx` targets (from `ROCPROFSYS_GFX_TARGETS`) are supported.
 - OMPVV offload tests now only compile if `amdflang` version is `>= 20`
 - Improve link time by reducing the number of GFX targets that binaries need to support.
   - RCCL is now passed a `GPU_TARGETS` var specifying the architectures to build/link against.
2026-01-20 12:13:21 -05:00
vedithal-amd 4a5cbbfba5 [rocprofiler-compute] Fix kernel/dispatch filtering (#2479)
* Fix kernel/dispatch fitlering in GUI

* Disallow --kernel and --dispatch filtering in analyze --gui mode since
  GUI frontend offers dropdown menu for kernel and dispatch filtering
    * Update CHANGELOG and documentation

* Gracefully handle N/A values

* Ensure workload path is valid before using it in GUI

* Ignore kernel filters if dispatch filters provided

* Add documentation for dispatch filtering overriding kernel filtering

* Fix typo

* Fix documentation

* remove unnecessary whitespace

* Address review comments

* Allow kernel/dispatch filtering with --gui

* Address review comments

* Address review comments

* Update CHANGELOG

* Fix formatting
2026-01-20 10:02:31 -05:00
vedithal-amd a926660670 [rocprofiler-compute] Use TheRock nightly builds in testing container (#2661)
* Use TheRock nightly builds in testing container

* Add HIP_DEVICE_LIB_PATH env var for hipcc to work

* Add HIP_PLATFORM env var for cmake hip package

* Add tarball placeholder

* Add -f to curl command to fail on HTTP error
2026-01-20 09:54:38 -05:00
Edgar Gabriel bc70ce551c replace memset with hipMemset (#390) 2026-01-20 08:14:25 -06:00
marantic-amd 51f49d8835 Add notice for the newly deprecated env variables (#2690) 2026-01-20 13:59:31 +01:00
Milan Radosavljevic b533f56197 Add automatic PyTorch library discovery for Python applications (#2623)
* Add automatic PyTorch library discovery for Python applications (#2623)
2026-01-20 08:42:49 +01:00
David Galiffi c83b3aae07 Fix Python Formatting (#2679)
Updated version of black to 26.1.0 updated some formatting rules

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2026-01-19 21:26:50 -05:00
jamessiddeley-amd 25090e003f [rocprof-compute] Pin ruff version for consistent formatting (#2680)
* pin ruff versions each to current latest

* Update rocprofiler-compute-formatting.yml

* Downgrade .pre-commit-config.yaml to match develop
2026-01-19 19:10:02 -05:00
Karthik Jayaprakash 99c3a06f4e SWDEV-549518 - Enable logging dynamically through HIP APIS. (#1079)
* SWDEV-549518 - Enable logging dynamically through HIP APIS.

* SWDEV-549518 - Adding ROCProfiler related new API changes.

* rocprofiler-sdk changes for hip api additions.

---------

Co-authored-by: Venkateshwar Reddy Kandula <venkateshwar.kandula1306@gmail.com>
Co-authored-by: jainprad <92369414+jainprad@users.noreply.github.com>
2026-01-19 16:16:14 -05:00
marandje 9f37cd6309 SWDEV-1 - Fix hipMemPoolTrimTo failing tests (#2628) 2026-01-19 21:10:15 +00:00
abchoudh-amd dd149d3957 [rocprofiler-compute] Support new attach/detach API (#2642)
* Removed attach tool library path

* Support new attach/detach API

* New attach/detach API was introduced in
  https://github.com/ROCm/rocm-systems/pull/1653

* Provide backward compatibility with old api

* Stabilize attach/detach tests by adding sleep to help workload get
  ready for attachment

* Fix typo in test name

---------

Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
Co-authored-by: Fei Zheng <44449748+feizheng10@users.noreply.github.com>
2026-01-19 16:00:14 -05:00
SakaSitharammurthy 1c5aa2d4e7 [SWDEV-567099] Updated 'amdsmi list --cpu all' command (#2519)
Signed-off-by: Saka, Sitharam Murthy <SitharamMurthy.Saka@amd.com>
2026-01-19 14:56:59 -06:00
vedithal-amd 0254181f42 [rocprofiler-compute] Analysis Database Schema Improvements (v1.2.0) (#2526)
* Analysis database v1.2.0

* `pc_sampling` and `roofline_data` tables should relate to `kernel` table instead of `workload` table

* Remove `kernel_name` fields in `pc_sampling` and `roofline_data` table

* Add kernel existence check for roofline data to prevent KeyError (#2536)

* Initial plan

* Add kernel existence check for roofline data to prevent KeyError

Co-authored-by: vedithal-amd <191402304+vedithal-amd@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vedithal-amd <191402304+vedithal-amd@users.noreply.github.com>

* Optimize analysis performance

* Refactor database schema: separate metric definitions from kernels

Reorganize the database ORM to decouple metric definitions from kernel
objects. This improves the schema design by:

- Rename Metric -> MetricDefinition and Value -> MetricValue for clarity
- Move metric definitions from kernel-level to workload-level, since
  metric definitions are shared across kernels
- Update relationships: MetricDefinition belongs to Workload,
  MetricValue
  references both MetricDefinition and Kernel
- Refactor metric_view to join through the new schema structure
- Update test fixtures to use renamed table and class names
- Update documentation with new example output using nbody workload
- Regenerate database schema and views diagrams

* Add min amd max aggregation in kernel_view

* Add primary key id from tables into the view

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vedithal-amd <191402304+vedithal-amd@users.noreply.github.com>
2026-01-19 15:25:43 -05:00
systems-assistant[bot] 88f07baa92 SWDEV-493792 - add split barriers for grid_group (#508)
* SWDEV-493792 - add split barriers for grid_group

* add tests

* Update change log

* Add Navi4 split barrier

* Update docs

* Use new Catch2 Approx macro

* Update split_barrier.cc to check for coop groups

---------

Co-authored-by: Jatin Chaudhary <jatchaud@amd.com>
Co-authored-by: Jatin Chaudhary <51944368+cjatin@users.noreply.github.com>
2026-01-19 09:17:00 -08:00
lloginov-amd e49b501e9a Add scratch memory support (#2211) 2026-01-19 16:24:30 +01:00
Gopesh Bhardwaj 1ac805cb35 [rocprofiler-sdk][Documentation] Updating CHANGELOG for 7.2 (#2573)
* Updating CHANGELOG for 7.2

* Updated CHANGELOG

* Addressed feedback

* Addressed Feedback

* Updated based on review comments

* Update installation steps and documentation links

Updated installation documentation and links to latest repository.

* Addressed Feedback

* Updated CHANGELOG

* Addressed feedback

* updated CHANGELOG

* Apply suggestion from @prbasyal-amd

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

---------

Co-authored-by: Swati Rawat <120587655+SwRaw@users.noreply.github.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
2026-01-17 14:55:55 +05:30
habajpai-amd b53c99669c Revert "fix: prevent double-free crash during process exit in amd-smi (#2213)" (#2640)
This reverts commit 7b00d3a89b.

The workaround is no longer needed - root cause fixed in:
- rocm-smi-lib (PR #2531): Made devInfoTypesStrings file-local static
- amdsmi (PR #2575): Added visibility("hidden") attribute
2026-01-16 16:08:52 -05:00
cfallows-amd 005f07004d [rocprofiler-compute] Update README (#2589)
* Update readme general section and citation version and date.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Minor change to project title- changing now to not forget but we are waiti8ng on feedback about citation from r&d.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Edit citation from R&D feedback

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
2026-01-16 12:48:19 -05:00
Rahul Manocha 249a947dc7 Fix set/get access failure for VMM on windows (#2280)
* Fix set/get access failure for VMM on windows

* seperate code paths for linux and windows to avoid using import/export calls in windows

---------

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2026-01-16 08:34:21 -08:00
akolliasAMD e7269cb925 Tests package (#384)
* added packaging for the tests and for the driver.sh

* making .sh files into programs so they keep permissions
2026-01-16 09:10:36 -07:00
Aurelien Bouteiller cca7872bcf new tester: put to all pes from all lanes concurrently (#112)
* Add put to all pes from all lanes concurrently

* Remove wg_init, use size_t for size params, 64bit data exchange (more
bits for verification masking)

* Rename to flood-test, add put,putnbi,p,get,getnbi,g variants, count time
correctly

* Add flood tester to the testing script

* add to gda test case w/o the _g variant that is not implemented.
2026-01-16 10:40:48 -05:00
vedithal-amd f64d8e0f43 [rocprofiler-compute] Improve native tool discovery and partition detection (#2630)
* Improve native tool discovery and partition detection

- Enhanced native tool path resolution to support CMAKE_INSTALL_LIBDIR variations
  (lib, lib64, lib32, etc.) using glob pattern matching
- Extracted path variables to avoid duplication in error messages
- Improved error message clarity by showing exact paths searched for .so and .cpp files
- Simplified code path construction using consistent Path.resolve().parents[x] syntax

- Fixed redundant partition warnings on pre-MI300 GPUs by adding architecture check
- Only query compute/memory partition on MI300+ series (gfx940+)
- Added proper type hints for gpu_arch parameter
- Moved gpu_info extraction after soc_info to ensure gpu_arch is available
- Improved code comments for MI300 series threshold

* Handle gpu arch like a hex string
2026-01-16 10:36:19 -05:00
Fábio Mestre e6236417f7 SWDEV-571222 - Fix bf16 headers on gcc (#2260)
GCC does not support anonymous structs with members that have non-trivial constructors. This commit changes the header to remove the union when compiling with gcc. This should be a non-breaking change for other compilers.
2026-01-16 15:02:48 +00:00
Edgar Gabriel 6f512e92a5 fix allreduce tester (#385)
- use the reduce_psync buffers for synchronization in allreduce, not the
  barrier_psync.
- execute a wwg barrier after the allreduce operation. After internal
  discussion it was determined that it is required for correctness.
2026-01-16 08:10:25 -06:00
Fábio Mestre 7794ac9ac6 [hip-tests] Fix Float16 accuracy tests (#2178)
Tests were relying on floats for calculating ulp values when validating the output. This is not correct given that the calculations are done using Float16. The fix is to update the test framework to use fp16 ulp instead.
2026-01-16 13:25:11 +00:00
Kian Cossettini 9f014db6a4 [rocprofiler-systems] Update install path for examples (#2625)
* Update install path for examples to `share/rocprofiler-systems/examples`

----

Co-authored-by: Kian Cossettini <Kian.Cossettini@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2026-01-15 21:51:16 -05:00
Omri Mor 885e41ec62 ionic: fix byteswap functions (added in #345), missed in #368 (#388) 2026-01-15 14:19:19 -08:00
Omri Mor cf8b72a047 Replace byteswap interface to align with C++23 std::byteswap (#368)
* byteswap<T> returns by value
* replace hand-rolled implementations with Clang __builtin_bswap<N> intrinsics
* new high-level interface endian::to_be, endian::from_be, etc. to indicate conversion direction
2026-01-15 13:03:01 -08:00
Mark Meserve 8760fb4976 attach: Formalize ROCAttach API (#1653)
* attach: Formalize ROCAttach API

- Make ROCAttach public with public headers
- Change detach to take a PID
  - attach and detach are now reentrant
- Cleanup of states and signal handling in ptrace session
- Fixes mixed up definition of ROCPROF_ATTACH_TOOL_LIBRARY
  - ROCPROF_ATTACH_TOOL_LIBRARY now always means the tool library loaded by the attachment target
  - ROCPROF_ATTACH_LIBRARY refers to the library used to perform attachment
- Add direct call of rocprof-attach
- Fix python library call of rocprof-attach
  - Function now named attach(), changed from main()

* attach: rocprof-compute ROCAttach updates

- Update to new library names
- Correct usage of C lib detach

* attach: add test for rocattach

- Disable ASan, TSan, and UBSan for the new parallel-attach test
- Lower log level for LSan tests, existing behavior from other tests

---------

Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
2026-01-15 14:32:14 -06:00
dsclear-amd 2482bff0b7 Excludes (more) docs-only changes from .azuredevops/rocm_ci_caller.yml. (#2615)
Motivation

We wish to avoid triggering full Jenkins runs for docs-only PRs, as this takes up testing resources and slows development time. rocm_ci_caller.yml already excludes some docs-only changes, but this can be improved to exclude them along more paths.
Technical Details

The checks that rocm_ci_caller.yml uses to determine if a changed file in a PR is worth a Jenkins run has been increased to exclude more paths and more file suffixes.
JIRA ID

AIROCDOC-78, AIROCDOC-424
Test Plan

    Created a test branch users/dsclear/shorten_workflows_test_root with the changes in this PR, branched from develop.
    Branched users/dsclear/shorten_workflows_test_bin_3 and users/dsclear/shorten_workflows_test_text_3 from users/dsclear/shorten_workflows_test_root.
    Modified users/dsclear/shorten_workflows_test_bin_3 to add two .h files, and submitted a PR into users/dsclear/shorten_workflows_test_root (Test PR, do not merge. Test PR to test Jenkins CI/CD modifications. #2613).
    Modified users/dsclear/shorten_workflows_test_text_3 to add a new .txt file, and submitted a PR into users/dsclear/shorten_workflows_test_root (Test PR, do not merge. Test PR to test Jenkins CI/CD modifications (docs only). #2614).

Test Result

The test PR in step 3 caused rocm_ci_caller.yml to attempt to trigger Jenkins, as this is a 'non-docs' change.
The test PR in step 4 had the attempt to trigger Jenkins skipped, as this is a 'docs-only' change.
2026-01-15 14:54:20 -05:00