Graf Tiomantas

74722 Tiomáintí

Údar SHA1 Teachtaireacht Dáta
David Galiffi 5a7a32bc12 Bump version from 1.4.0 to 1.5.0 (#2834)
Bumping the version now that 7.10 has branched.
2026-01-24 01:10:27 -05:00
Jonathan R. Madsen 5a930e9c25 [rocprofiler-sdk] Fix fmt::join build errors (#2505) 2026-01-23 22:01:45 -06:00
vedithal-amd aa5dfb98f9 MI350 Fix L2 cache to HBM read counters/metrics (#2501)
* Fix rocprofiler-sdk metrics definition

* Use TCC_EA0_RDREQ_128B instead of TCC_BUBBLE counter for L2 cache to
  HBM counters and metrics

* Update MI350 counter definitions
    * FETCH_SIZE
    * BANDWIDTH_EA

* Update MI350 metrics definitions
    * System Speed of Light, L2-Fabric Read BW
    * Roofline Plot Points, AI (Arithmetic Intensity) HBM
    * Roofline Performance Rates, HBM Bandwidth

* Remove redundant definition for gfx950 and fix BANDWIDTH_EA definition

Test HBM bandwidth metric for memcopy workload

* Add memcopy.cpp workload

* Add metric validation test suite to validate HBM Bandwidth metric for
  memcopy workload

* Move gpu_soc() to test_utils.py for better re-usability

* Update TUI analysis config

* Fix hbm bandwidth formula for mi350 in calc_ai_profile

Co-authored-by: Alysa Liu <Alysa.Liu@amd.com>
2026-01-23 15:56:24 -05:00
vedithal-amd 809eca7616 [rocprofiler-compute] Pin dependencies and fix test configuration paths, remove setuptools dependency (#2821)
* Pin dependencies and fix test paths for package layout

- Pin all dependencies in requirements.txt to specific versions to ensure stability and reproducibility.
- Update test_autogen_config.py to correctly resolve source paths for both development and installed package layouts.
- Validated compatibility with Python 3.9, 3.10, 3.11, and 3.12.

* Remove setuptools dependency since we dont support pip install and instead use cmake
2026-01-23 15:46:58 -05:00
German Andryeyev 2b1b41f4da AIRUNTIME-32 - Add try/catch around all HIP API calls (#2822) 2026-01-23 14:53:20 -05:00
SaleelK 340f3aa887 clr: Implement dynamic stream to HWq logic (#1958)
* clr: Implement dynamic stream to HW queue assignment

This change implements dynamic stream to hardware queue (HWq) mapping
with the following features:

* Queue depth heuristics with weights for optimal HWq assignment
* Make last used queue sticky for better locality
* Use pipe HWq to pipe mapping - gfx9 follows a round-robin queue to
  pipe mapping based on creation order (single process per device only,
  as pipe ID is statically assigned by runtime)
* More aggressive heuristic usage for better queue distribution
* Extend dynamic queues support for all stream priorities

Environment variables:
* DEBUG_HIP_DYNAMIC_QUEUE: 0 - disabled, 1 - Depth heuristics 2 -
  Depth+Pipe heuristics
* DEBUG_HIP_IGNORE_STREAM_PRIORITY=1: ignore priority stream creation

* clr: Clean up last_used_queue_
2026-01-23 10:40:54 -08:00
systems-assistant[bot] 89170521f5 Bump rocm-docs-core from 1.29.0 to 1.31.3 in /docs/sphinx (#2752)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.29.0 to 1.31.3.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.29.0...v1.31.3)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-version: 1.31.3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
2026-01-23 09:59:09 -07:00
Julia Jiang 28ef3b4b9e swdev-564412 - fix seg faults in atomic operation tests with muti-gpus (#2541) 2026-01-23 11:26:27 -05:00
cfallows-amd 62dd4d114d [rocprofiler-compute] Fixes for roofline when used with iteration multiplexing (#2635)
*Added iteration_multiplex_impute_counters on pmc data- GUI dataframe did not implement this in the build_layout method previously
*Created a Workload() in profile mode post-processing for roofline html standalone plot to be generated- this will be removed once roofline plot is moved to analyze phase in future release
*Added iteration_multiplexing run parameter to roofline object init so that we can accurately parse dataframe if the option was used during profiling- this helps us to avoid reading nan values in certain dispatches that did not get imputed in calc_ai_profile
*Cleanup for unused legacy code, adjusted method parameters to assist in moving roofline plotting to analyze mode in future release
*Update iteration multiplexing data imputation algorithm to impute counters for ungrouped dispatches at the end based on the previous group. This however won't work if there are no dispatches that can be grouped (i.e. number of dispatches < number of counter buckets)

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-23 11:10:46 -05:00
Tao Sang fc4422d73b SWDEV-1 Fix vulkan test build failure in Windows (#2512)
Fix vulkan test build failure in Windows
2026-01-23 10:55:17 -05:00
systems-assistant[bot] 563776a949 [GDA/IONIC] only ring doorbell on active lanes (#2727)
Co-authored-by: Yiltan <yiltan@amd.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
2026-01-23 09:26:46 -05:00
Jin Jung 48347bc857 SWDEV-572679 - Fix hipGraphicsGLRegisterImage (#2475) 2026-01-23 06:25:10 -08:00
jamessiddeley-amd 69281bbcf4 [rocprofiler-compute] Threshold Based Clamping in Analyze Stage (#2565)
* add threshold clamping function + parse in parser.py (with I/O)

* implemented hybrid threshold solution

* update changelog

* removed absolute threshold hybrid approach; restored relative threshold + warn

* edited warning msg, threshold -> 1%

* update changelog

* added 2 test cases

* ran master workflow yaml config files

* added to FAQ

* Revert "ran master workflow yaml config files"

This reverts commit 75a670e14d6f1619ebbda0ec218755ccbe0d22b1.

* update FAQ

* update config hashes

* Broke down long functions into Class with sub-functions

* ruff format

* addressed comments
2026-01-23 00:54:54 -05:00
marantic-amd 7af2dba741 [rocprof-sys] Align rocpd symbols of the same counter types (#2675)
## Motivation

In order for Optiq to be able to detect that counter tracks are of the same type, we aligned `info_pmc` symbol naming across the tracks of the same type. Being able to know this will be useful for grouping and categorizing similar types of counter tracks and for setting up a consistent y-axis scale when plotting the values on charts.

## Technical Details

Replace unique and/or ordered symbol names with counter-common symbol name which will be the same for the counters of the same type, with counter track name remaining the unique identifier for that counter track. For example, the "symbol" field was "JpegAct_0" but is now "JpegAct".
2026-01-23 00:36:08 -05:00
marantic-amd 956a73c4c8 [rocprof-sys] Use fmt APIs to construct strings instead of JOIN (#2643)
## Motivation

With the introduction of the new logging system base on `spdlog` library, opportunity shows to replace `timemory` dependent JOIN implementation with `fmt` library `format` and `join` APIs, which are shipped as a part of `spdlog` lib

## Technical Details

Use `fmt` provided APIs to properly format and package strings.
2026-01-23 00:34:58 -05:00
habajpai-amd 3318c540ea fix roctx range markers not paired correctly in rocpd output (#2793)
## Motivation

Fix roctx range markers (Push/Pop, Start/Stop) not being displayed correctly in rocpd output. The Visualizer was showing only Stop/Pop events as instant markers instead of proper duration ranges with labels, while Perfetto output displayed them correctly.

## Technical Details

In `tool_tracing_callback_stop()`, the rocpd/database output was using `user_data->value` (timestamp of the Pop/Stop event) instead of `begin_ts` (corrected timestamp from the corresponding Push/Start event) when calling `cache_region()`.

The Perfetto output already used `begin_ts` correctly (line 818). This change aligns the rocpd output with the Perfetto behavior by using `begin_ts` instead of `user_data->value` (line 887).

Updated rocpd validation rules
2026-01-22 23:47:43 -05:00
Sajina PK 6cf9b217d3 Update CHANGELOG.md (#2813)
Modified the release version for the XGMI/PCIe feature.
2026-01-22 17:48:30 -05:00
Nilesh M Negi cba2e3de2f Add rccl-reviewers to CODEOWNERS file (#2816)
* Add rccl-reviewers to CODEOWNERS file
* Re-order lines based on copilot suggestion
2026-01-22 16:29:48 -05:00
yugang-amd 576fbaef74 Fix link for global enum and defines (#2802) 2026-01-22 16:17:25 -05:00
habajpai-amd 7e74d163fd [rocprof-sys] Fix RCCL comm_data counters in rocpd output (#2607)
## Motivation

<!-- Explain the purpose of this PR and the goals it aims to achieve. -->
The validate-rccl-* tests were failing because "RCCL Comm" counters were not being written to perfetto traces when using the new cached-perfetto approach.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->
Root Cause: The write_perfetto_counter_track() in rccl.cpp was only called when config::get_use_perfetto() returned true, which requires ROCPROFSYS_TRACE_LEGACY=ON. This meant RCCL counters weren't captured with the new trace cache approach.

Solution: Integrated RCCL with the trace cache system:

Changes to source/lib/rocprof-sys/library/rocprofiler-sdk/rccl.cpp:

- Added cache_rccl_comm_data_events<Track>() function to store RCCL comm data via pmc_event_with_sample with category::comm_data
- Modified tool_tracing_callback_rccl() to always cache events for new perfetto approach, while preserving legacy write_perfetto_counter_track() calls for backward compatibility

Changes to tests/rocprof-sys-testing.cmake:

- Added rccl_api to ROCPROFSYS_ROCM_DOMAINS to enable RCCL API callback tracing

Handler verification: The perfetto_processor_t already has a handler for ROCPROFSYS_CATEGORY_COMM_DATA in m_pmc_track_map that processes the cached events.
2026-01-22 15:38:19 -05:00
akolliasAMD 9caa248bfd added rocshmem-reviewers on CODEOWNERS file (#2809) 2026-01-22 14:12:56 -05:00
Danylo Lytovchenko 7f906ced10 Revert "rocr: SVMPrefetch to a particular numa node (#1063)" (#2792)
This reverts commit 0ba5a01baa.
2026-01-22 17:26:19 +01:00
vstojilj 9d84e31d23 Fix invalid use of hipEventElapsedTime (#2698) 2026-01-22 17:08:43 +01:00
systems-assistant[bot] 87ea43b642 SWDEV-540597 - Reset last error to avoid its impact in next iteration. (#558)
* SWDEV-540597 - Reset last error to avoid its impact in next iteration.

* SWDEV-540597 - Bypass compiler error as we need to call hipGetLastError without checking error to reset last error.

---------

Co-authored-by: Jaydeep Patel <jaydeepkumar.patel@amd.com>
2026-01-22 15:52:36 +05:30
ammallya ea94716e23 Migrating rccl and rccl-tests (#2750)
* Migrating rccl and rccl-tests

* Adding missing submodules for rccl
2026-01-21 18:16:19 -08:00
Ameya Keshava Mallya e4367dd053 Merge remote-tracking branch 'origin/develop' into preserved/rccl 2026-01-22 02:15:20 +00:00
David Yat Sin 5267cd334b rocr: Refactor SDMA object creation (#2629)
Refactor SDMA object creation and add comment to clarify why GCR is not
needed on DXG.
2026-01-21 21:09:56 -05:00
German Andryeyev d902429f1f rocr/hsakmt/wsl Move WSL under ROCR hsakmt. (#2638)
## Motivation
ROCR on Windows uses WSL implementation as the codebase. We want to make
sure Windows changes can continue to work with WSL and share the same
core implementation. Hence, it's easier to maintain the code under the
same rocm-system infrastructure and automate all builds/tests in the
future.

## Technical Details
The new files is the copy of https://github.com/ROCm/librocdxg/ with
preserved history. Native windows support and clean-ups will be added in
the following check-ins.

The same command lines can be used to build WSL under libhsakmt folder
for now.
```
# Set the Windows SDK path (adjust version number if different)
export win_sdk='/mnt/c/Program Files (x86)/Windows Kits/10/Include/10.0.26100.0/'
 
# Build the library
mkdir -p build
cd build
cmake .. -DWIN_SDK="${win_sdk}/shared"
make
sudo make install

```
## JIRA ID
SWDEV-558849

## Test Plan
N/A

## Test Result
N/A

## Submission Checklist
2026-01-21 20:00:33 -05:00
German Andryeyev 196baa4321 rocr: Fix static build in Windows (#2660) 2026-01-21 18:44:51 -05:00
Ameya Keshava Mallya 8861267e7a Merge remote-tracking branch 'origin/develop' into preserved/rccl-tests 2026-01-21 22:04:28 +00:00
Sunday Clement 0ba5a01baa rocr: SVMPrefetch to a particular numa node (#1063)
In order for hipMemPrefetchAysnc_v2() api to work, we need rocr to
migrates the ranges of pages requested to the particular NUMA node in
question, via move_pages().

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2026-01-21 16:52:15 -05:00
ammallya 1dfc679821 Migration of rocshmem (#2742) 2026-01-21 13:31:26 -08:00
Ameya Keshava Mallya c375529158 Merge remote-tracking branch 'origin/develop' into preserved/rocshmem 2026-01-21 21:27:47 +00:00
Ioannis Assiouras f05a33968f SWDEV-570500 - Fixed graph node to stream scheduling in multistream path (#2596) 2026-01-21 20:48:46 +00:00
ammallya 7cab5ea514 Add rocshmem to labeler (#2724) 2026-01-21 12:47:46 -08:00
pghoshamd 8d73201a35 AILIROCR-4 Fix double free of tma_region (#2678) 2026-01-21 15:31:10 -05:00
pghoshamd 793755532f SWDEV-561708 Initial shared queue pool apis (#1614)
* SWDEV-561708 Initial shared queue pool apis

* Validate params; some fixes in callback function (but still needs to be checked)

* Dtor cleanup

* minor

* Enable profiling; remove callback since aql_queue takes care of it

* setPriority and setCuMask APIs updated for counted queues

* Increasing step and minor version for rocprofiler

* Tests for CountedQueueManager

* tests

* Code refactored to make pool manager part of GpuAgent only (incomplete); unique handles issue pending

* Refactored code to support CQM inside GpuAgent and unique handles; multithreaded test added

* Changed to ASSERT_SUCCESS macros for all tests

* RIng buffer overflow test added

* tests fixed; cleanup added at hsa_shutdown

* priority conversion table changes

* Compiler warnings fixed

* Rewrite 1 test; add desc and improve SetUp() code

* Improvement

* Unififed getinfo for both counted and non-counted queues

* Address PR feedback

* Addressing feedback: memleak, data type mismatch, documentation

* improve comment

* format

* Missing HSA_API macros for roctracer

* Revert "Addressing feedback: memleak, data type mismatch, documentation"

This reverts commit 5e498a55fb3640e00d06cec63dcec79293fb23de.

* Improving acquire api doc

* release api doc improved

* error codes for release api doc
2026-01-21 15:30:04 -05:00
Ameya Keshava Mallya f1b313780b Merge commit 'a52452e891d5dc07c83cf4edaea01ae4ab684b3a' into develop 2026-01-21 20:29:41 +00:00
Ameya Keshava Mallya 8d996cc05f Merge commit '3d4813d99196bb349eccd50a925e2addc8f1622c' into develop 2026-01-21 20:28:14 +00:00
Ameya Keshava Mallya 12ab8df3bc Add 'projects/rocshmem/' from commit '0496586829058af5cfd7f23acda2a6d0040da584'
git-subtree-dir: projects/rocshmem
git-subtree-mainline: 5fd976da70
git-subtree-split: 0496586829
2026-01-21 20:25:37 +00:00
mberenjk 6743f00777 applying the changes from net_ib.cc to rocm_net_ib.cc to ensure DMABUF-disabled configurations are respected. (#2152)
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl commit: 3d4813d991]
2026-01-21 12:11:56 -08:00
mberenjk 3d4813d991 applying the changes from net_ib.cc to rocm_net_ib.cc to ensure DMABUF-disabled configurations are respected. (#2152)
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2026-01-21 12:11:56 -08:00
vedithal-amd 5fd976da70 Fix typo in Bypass Req metric in 17.3 section for MI350 (#2704) 2026-01-21 15:00:23 -05:00
mberenjk 7069fc936f Adding a check to respect DMABUF being disabled by the user (#2076)
Co-authored-by: Marzieh Berenjkoub <mberenjk@.amd.com>

[ROCm/rccl commit: 9a443f3054]
2026-01-21 11:08:12 -08:00
mberenjk 9a443f3054 Adding a check to respect DMABUF being disabled by the user (#2076)
Co-authored-by: Marzieh Berenjkoub <mberenjk@.amd.com>
2026-01-21 11:08:12 -08:00
Tao Sang 163e44d0a8 SWDEV-555889 - Support mipmap on rocr (#2082)
* SWDEV-555889 - Support mipmap on rocr

Support mipmap in hip-rt on rocr backend.
Enable all mipmap tests in Windows.
Some other minor improvement.

Add some SRD logs that will be removed finally.

* Add sampler.mipFilter to fix sampler issues on mipmap in rocr.
Fix format issues of view of leveled image and  mipmap image in blit kernel in rocr.
Enabled disabled mipmap tests.

* Rewrite view logic

* Set word4.f.PITCH = 0 for mipmap SRD on navi31 to fix unstable test issues.
Reset last error in nagative tests.

* Remove SRD dump log from hip-rt
Let Rocr mipmap log be in condition.

* minor format chang

* Exclude mipmap tests for mi200+ which don't support mipmap.
2026-01-21 09:10:29 -08:00
Sam Ruscica 5daeb14582 SWDEV-547291 - Interop for OpenGL (#2350)
Updated to convert flags correctly

Added ObjectRegistry to track registered and mapped resources and incorporated it into hip_gl.

Added mip level check

Made functions static in-line

Reworked validation to be more clear.
2026-01-21 09:08:55 -08:00
Nilesh M Negi 244047310e [DEVICE] Switch to amd-smi from rocm-smi (#1759)
* Use amd-smi instead of rocm-smi for ROCM_VERSION >= 7.11.0

[ROCm/rccl commit: cd745b1f4b]
2026-01-21 09:05:47 -06:00
Nilesh M Negi cd745b1f4b [DEVICE] Switch to amd-smi from rocm-smi (#1759)
* Use amd-smi instead of rocm-smi for ROCM_VERSION >= 7.11.0
2026-01-21 09:05:47 -06:00
Gopesh Bhardwaj c563286f96 Update changelog for ROCprofiler-SDK 1.1.0 (#2717)
using only arch name
2026-01-21 20:15:39 +05:30