Граф коммитов

74699 Коммитов

Автор SHA1 Сообщение Дата
systems-assistant[bot] 87ea43b642 SWDEV-540597 - Reset last error to avoid its impact in next iteration. (#558)
* SWDEV-540597 - Reset last error to avoid its impact in next iteration.

* SWDEV-540597 - Bypass compiler error as we need to call hipGetLastError without checking error to reset last error.

---------

Co-authored-by: Jaydeep Patel <jaydeepkumar.patel@amd.com>
2026-01-22 15:52:36 +05:30
ammallya ea94716e23 Migrating rccl and rccl-tests (#2750)
* Migrating rccl and rccl-tests

* Adding missing submodules for rccl
2026-01-21 18:16:19 -08:00
Ameya Keshava Mallya e4367dd053 Merge remote-tracking branch 'origin/develop' into preserved/rccl 2026-01-22 02:15:20 +00:00
David Yat Sin 5267cd334b rocr: Refactor SDMA object creation (#2629)
Refactor SDMA object creation and add comment to clarify why GCR is not
needed on DXG.
2026-01-21 21:09:56 -05:00
German Andryeyev d902429f1f rocr/hsakmt/wsl Move WSL under ROCR hsakmt. (#2638)
## Motivation
ROCR on Windows uses WSL implementation as the codebase. We want to make
sure Windows changes can continue to work with WSL and share the same
core implementation. Hence, it's easier to maintain the code under the
same rocm-system infrastructure and automate all builds/tests in the
future.

## Technical Details
The new files is the copy of https://github.com/ROCm/librocdxg/ with
preserved history. Native windows support and clean-ups will be added in
the following check-ins.

The same command lines can be used to build WSL under libhsakmt folder
for now.
```
# Set the Windows SDK path (adjust version number if different)
export win_sdk='/mnt/c/Program Files (x86)/Windows Kits/10/Include/10.0.26100.0/'
 
# Build the library
mkdir -p build
cd build
cmake .. -DWIN_SDK="${win_sdk}/shared"
make
sudo make install

```
## JIRA ID
SWDEV-558849

## Test Plan
N/A

## Test Result
N/A

## Submission Checklist
2026-01-21 20:00:33 -05:00
German Andryeyev 196baa4321 rocr: Fix static build in Windows (#2660) 2026-01-21 18:44:51 -05:00
Ameya Keshava Mallya 8861267e7a Merge remote-tracking branch 'origin/develop' into preserved/rccl-tests 2026-01-21 22:04:28 +00:00
Sunday Clement 0ba5a01baa rocr: SVMPrefetch to a particular numa node (#1063)
In order for hipMemPrefetchAysnc_v2() api to work, we need rocr to
migrates the ranges of pages requested to the particular NUMA node in
question, via move_pages().

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2026-01-21 16:52:15 -05:00
ammallya 1dfc679821 Migration of rocshmem (#2742) 2026-01-21 13:31:26 -08:00
Ameya Keshava Mallya c375529158 Merge remote-tracking branch 'origin/develop' into preserved/rocshmem 2026-01-21 21:27:47 +00:00
Ioannis Assiouras f05a33968f SWDEV-570500 - Fixed graph node to stream scheduling in multistream path (#2596) 2026-01-21 20:48:46 +00:00
ammallya 7cab5ea514 Add rocshmem to labeler (#2724) 2026-01-21 12:47:46 -08:00
pghoshamd 8d73201a35 AILIROCR-4 Fix double free of tma_region (#2678) 2026-01-21 15:31:10 -05:00
pghoshamd 793755532f SWDEV-561708 Initial shared queue pool apis (#1614)
* SWDEV-561708 Initial shared queue pool apis

* Validate params; some fixes in callback function (but still needs to be checked)

* Dtor cleanup

* minor

* Enable profiling; remove callback since aql_queue takes care of it

* setPriority and setCuMask APIs updated for counted queues

* Increasing step and minor version for rocprofiler

* Tests for CountedQueueManager

* tests

* Code refactored to make pool manager part of GpuAgent only (incomplete); unique handles issue pending

* Refactored code to support CQM inside GpuAgent and unique handles; multithreaded test added

* Changed to ASSERT_SUCCESS macros for all tests

* RIng buffer overflow test added

* tests fixed; cleanup added at hsa_shutdown

* priority conversion table changes

* Compiler warnings fixed

* Rewrite 1 test; add desc and improve SetUp() code

* Improvement

* Unififed getinfo for both counted and non-counted queues

* Address PR feedback

* Addressing feedback: memleak, data type mismatch, documentation

* improve comment

* format

* Missing HSA_API macros for roctracer

* Revert "Addressing feedback: memleak, data type mismatch, documentation"

This reverts commit 5e498a55fb3640e00d06cec63dcec79293fb23de.

* Improving acquire api doc

* release api doc improved

* error codes for release api doc
2026-01-21 15:30:04 -05:00
Ameya Keshava Mallya f1b313780b Merge commit 'a52452e891d5dc07c83cf4edaea01ae4ab684b3a' into develop 2026-01-21 20:29:41 +00:00
Ameya Keshava Mallya 8d996cc05f Merge commit '3d4813d99196bb349eccd50a925e2addc8f1622c' into develop 2026-01-21 20:28:14 +00:00
Ameya Keshava Mallya 12ab8df3bc Add 'projects/rocshmem/' from commit '0496586829058af5cfd7f23acda2a6d0040da584'
git-subtree-dir: projects/rocshmem
git-subtree-mainline: 5fd976da70
git-subtree-split: 0496586829
2026-01-21 20:25:37 +00:00
mberenjk 6743f00777 applying the changes from net_ib.cc to rocm_net_ib.cc to ensure DMABUF-disabled configurations are respected. (#2152)
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl commit: 3d4813d991]
2026-01-21 12:11:56 -08:00
mberenjk 3d4813d991 applying the changes from net_ib.cc to rocm_net_ib.cc to ensure DMABUF-disabled configurations are respected. (#2152)
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2026-01-21 12:11:56 -08:00
vedithal-amd 5fd976da70 Fix typo in Bypass Req metric in 17.3 section for MI350 (#2704) 2026-01-21 15:00:23 -05:00
mberenjk 7069fc936f Adding a check to respect DMABUF being disabled by the user (#2076)
Co-authored-by: Marzieh Berenjkoub <mberenjk@.amd.com>

[ROCm/rccl commit: 9a443f3054]
2026-01-21 11:08:12 -08:00
mberenjk 9a443f3054 Adding a check to respect DMABUF being disabled by the user (#2076)
Co-authored-by: Marzieh Berenjkoub <mberenjk@.amd.com>
2026-01-21 11:08:12 -08:00
Tao Sang 163e44d0a8 SWDEV-555889 - Support mipmap on rocr (#2082)
* SWDEV-555889 - Support mipmap on rocr

Support mipmap in hip-rt on rocr backend.
Enable all mipmap tests in Windows.
Some other minor improvement.

Add some SRD logs that will be removed finally.

* Add sampler.mipFilter to fix sampler issues on mipmap in rocr.
Fix format issues of view of leveled image and  mipmap image in blit kernel in rocr.
Enabled disabled mipmap tests.

* Rewrite view logic

* Set word4.f.PITCH = 0 for mipmap SRD on navi31 to fix unstable test issues.
Reset last error in nagative tests.

* Remove SRD dump log from hip-rt
Let Rocr mipmap log be in condition.

* minor format chang

* Exclude mipmap tests for mi200+ which don't support mipmap.
2026-01-21 09:10:29 -08:00
Sam Ruscica 5daeb14582 SWDEV-547291 - Interop for OpenGL (#2350)
Updated to convert flags correctly

Added ObjectRegistry to track registered and mapped resources and incorporated it into hip_gl.

Added mip level check

Made functions static in-line

Reworked validation to be more clear.
2026-01-21 09:08:55 -08:00
Nilesh M Negi 244047310e [DEVICE] Switch to amd-smi from rocm-smi (#1759)
* Use amd-smi instead of rocm-smi for ROCM_VERSION >= 7.11.0

[ROCm/rccl commit: cd745b1f4b]
2026-01-21 09:05:47 -06:00
Nilesh M Negi cd745b1f4b [DEVICE] Switch to amd-smi from rocm-smi (#1759)
* Use amd-smi instead of rocm-smi for ROCM_VERSION >= 7.11.0
2026-01-21 09:05:47 -06:00
Gopesh Bhardwaj c563286f96 Update changelog for ROCprofiler-SDK 1.1.0 (#2717)
using only arch name
2026-01-21 20:15:39 +05:30
Kian Cossettini 28b2ade7d2 Update mentions of OpenMP to reflect newer implementation (#2701)
Update timemory examples in docs to use the `rocprofiler-sdk` API.
2026-01-21 07:18:51 -05:00
Jatin Chaudhary 0590a72d4b Rework clock based unit tests (#2646) 2026-01-21 10:55:33 +00:00
hongkzha-amd d94185c5b2 rocrtst: set HSA_ENABLE_INTERRUPT after TestExample creation (#2687)
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Co-authored-by: cfreeamd <166262151+cfreeamd@users.noreply.github.com>
2026-01-21 10:39:50 +08:00
Karthik Jayaprakash 6a84a00208 Use size_t datatype for global dimensions. (#2604) 2026-01-20 20:39:07 -05:00
JeniferC99 50e00d1b94 Update CODEOWNERS (#2705): add /project/amdsmi owner 2026-01-20 15:49:59 -08:00
yugang-amd 05a6d017c6 [ROCmInfo] docs: mono-repo changes and style edits (#2584)
* initial edits

* mono repo related updates

* standardize component name

* style edits

* more edits
2026-01-20 18:06:54 -05:00
Yiltan 55aab4d62e [Docs] Clarify ROCSHMEM_HEAP_SIZE (#392)
* clarify ROCSHMEM_HEAP_SIZE

* Apply suggestions from code review

Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>

---------

Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>

[ROCm/rocshmem commit: 0496586829]
2026-01-20 17:22:18 -05:00
Yiltan 0496586829 [Docs] Clarify ROCSHMEM_HEAP_SIZE (#392)
* clarify ROCSHMEM_HEAP_SIZE

* Apply suggestions from code review

Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>

---------

Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
2026-01-20 17:22:18 -05:00
prasanna-amd 520f309bb1 fix potential segfaults due to use after malloc fails (#2137)
* fix potential segfaults

* replace NULL with nullptr

---------

Co-authored-by: Prasannakumar Murugesan <prmuruge@amd.com>

[ROCm/rccl commit: 4a32ec2501]
2026-01-20 14:11:29 -08:00
prasanna-amd 4a32ec2501 fix potential segfaults due to use after malloc fails (#2137)
* fix potential segfaults

* replace NULL with nullptr

---------

Co-authored-by: Prasannakumar Murugesan <prmuruge@amd.com>
2026-01-20 14:11:29 -08:00
prasanna-amd bb47eee7cc fix bug in reduce kernel bfloat16 for ROCm >= 6.0 (#2139)
Co-authored-by: Prasannakumar Murugesan <prmuruge@amd.com>
As part of an earlier commit, bfloat16 handling in reduce kernel for FuncMinMax fell into generic/default template when there is no SPECIALIZE_REDUCE for a particular type, this generic template does a bitwise integer comparison and it broke bfloat16 ops.
change the else-if statement to else statement, that way it covers both ROCm version < 6.0 and >= 6.0 (with ROCm > 6.0, device.h already typedefs __hip_bfloat16 to hip_bfloat16, so no special case is needed here).

[ROCm/rccl commit: fa366ac03f]
2026-01-20 14:07:20 -08:00
prasanna-amd fa366ac03f fix bug in reduce kernel bfloat16 for ROCm >= 6.0 (#2139)
Co-authored-by: Prasannakumar Murugesan <prmuruge@amd.com>
As part of an earlier commit, bfloat16 handling in reduce kernel for FuncMinMax fell into generic/default template when there is no SPECIALIZE_REDUCE for a particular type, this generic template does a bitwise integer comparison and it broke bfloat16 ops.
change the else-if statement to else statement, that way it covers both ROCm version < 6.0 and >= 6.0 (with ROCm > 6.0, device.h already typedefs __hip_bfloat16 to hip_bfloat16, so no special case is needed here).
2026-01-20 14:07:20 -08:00
dependabot[bot] 48d1530205 Bump pynacl from 1.5.0 to 1.6.2 in /docs/sphinx (#2127)
Bumps [pynacl](https://github.com/pyca/pynacl) from 1.5.0 to 1.6.2.
- [Changelog](https://github.com/pyca/pynacl/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/pynacl/compare/1.5.0...1.6.2)

---
updated-dependencies:
- dependency-name: pynacl
  dependency-version: 1.6.2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: f38665ac9a]
2026-01-20 14:30:10 -07:00
dependabot[bot] f38665ac9a Bump pynacl from 1.5.0 to 1.6.2 in /docs/sphinx (#2127)
Bumps [pynacl](https://github.com/pyca/pynacl) from 1.5.0 to 1.6.2.
- [Changelog](https://github.com/pyca/pynacl/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/pynacl/compare/1.5.0...1.6.2)

---
updated-dependencies:
- dependency-name: pynacl
  dependency-version: 1.6.2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>
2026-01-20 14:30:10 -07:00
dependabot[bot] c2fd82c02d Bump rocm-docs-core from 1.26.0 to 1.29.0 in /docs/sphinx (#2051)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.26.0 to 1.29.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.26.0...v1.29.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-version: 1.29.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: 131900c264]
2026-01-20 14:28:59 -07:00
dependabot[bot] 131900c264 Bump rocm-docs-core from 1.26.0 to 1.29.0 in /docs/sphinx (#2051)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.26.0 to 1.29.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.26.0...v1.29.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-version: 1.29.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
2026-01-20 14:28:59 -07:00
dependabot[bot] a1bb4108c1 Bump urllib3 from 2.5.0 to 2.6.3 in /docs/sphinx (#2130)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.5.0 to 2.6.3.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.5.0...2.6.3)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: d94ecb7772]
2026-01-20 14:27:31 -07:00
dependabot[bot] d94ecb7772 Bump urllib3 from 2.5.0 to 2.6.3 in /docs/sphinx (#2130)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.5.0 to 2.6.3.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.5.0...2.6.3)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>
2026-01-20 14:27:31 -07:00
Mythreya Kuricheti 73df3f12b3 use message instead of warning for nccl.h C++ check (#2128)
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: 0dc31b1a4a]
2026-01-20 14:21:38 -07:00
Mythreya Kuricheti 0dc31b1a4a use message instead of warning for nccl.h C++ check (#2128)
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>
2026-01-20 14:21:38 -07:00
Kian Cossettini 7c9361190b [rocprofiler-systems] Fix MPI recv_data calculation (#2694)
Fix incorrect `mpi_recv` calculation. It was using `_send_size` instead of `_recv_size` for `mpi_recv`.
2026-01-20 16:17:22 -05:00
Allen Hubbe 3edd56ca23 gda ionic: ccqe cleanup and error check (#389)
Delete unreachable ccqe polling path, ionic_poll_wave_ccqe().
Move cqe error check to ionic_quiet_internal_ccqe().

Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>

[ROCm/rocshmem commit: 6b00964f32]
2026-01-20 15:26:53 -05:00
Allen Hubbe 6b00964f32 gda ionic: ccqe cleanup and error check (#389)
Delete unreachable ccqe polling path, ionic_poll_wave_ccqe().
Move cqe error check to ionic_quiet_internal_ccqe().

Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
2026-01-20 15:26:53 -05:00