76333 کامیت‌ها

مولف SHA1 پیام تاریخ
Longlong Yao 5ebe95d5b2 librocdxg: query total shared GPU memory
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2025-12-24 13:14:55 +08:00
Longlong Yao 6652313128 librocdxg: Add Strix and Strix Halo support
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2025-12-24 13:13:22 +08:00
Longlong Yao a2c5e19624 librocdxg: add interface to query segment info
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2025-12-24 13:08:12 +08:00
Longlong Yao 26cf8c8298 librocdxg: add interface to query segment info
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2025-12-24 13:08:12 +08:00
Bindhiya Kanangot Balakrishnan 641fa27699 [SWDEV-566543] Fix param validation in FrequenciesRead test (#2430)
Fixed incorrect error code expectation in FrequenciesRead
test when calling amdsmi_get_gpu_pci_bandwidth() with nullptr
parameter.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-12-23 15:38:25 -08:00
Ioannis Assiouras 49b8900158 SWDEV-558849 - keep the lastEnqueueCommand_ when PAL backend is enabled (#2320) 2025-12-23 21:24:09 +00:00
ammallya c2c4d4c1f5 Revert "Adding full build capability to theROCK for HIP changes (#2003)" (#2441)
This reverts commit 0a52f5c101.

Reverts #2003

MIOpen build failures on windows causing blockers on unrelated file changes.
2025-12-23 13:01:08 -08:00
vedithal-amd 61fd728fdb [rocprofiler-compute] Faster counter accuracy testing (#2420)
* Faster counter accuracy testing

* Better handle SPI_CSN_* metrics for lesser than MI350 series

* Use metric filtering to collect only relevant counters for comparison

* Ensure all workload folders are deleted after testing is completed

* Dont use clean_existing=False

* Add manual test for all counter accuracy
2025-12-23 13:13:53 -05:00
vedithal-amd d7302d6c1c [rocprofiler-compute] Test env. vars. in rocprofiler-sdk backend (#2414)
* Test env. vars. in rocprofiler-sdk backend

* Improve rocprofiler-sdk backend test case to check for env. vars. and
  ensure we do not overwrite irrelevant env. vars.

* Remove unnecessary usage of ROCPROF_INDIVIDUAL_XCC_MODE env. var.

* Formatting fixes

* Test fixes

* Remove redundant code in tests

* Remove usage of utils_mod and use utils instead, this prevents
  duplicate imports
2025-12-23 13:13:28 -05:00
vedithal-amd 588773f9bf [rocprofiler-compute] Fix for multi process workload profiling (#2418)
* Fix for multi process workload profiling

Native counter collection tool updates:
    * Do not dump empty counter data for a process
    * Use PID instead of UUID for dumped csv files to facilitate correlation
    * Handle merging multiple pairs of rocpd (from sdk tool) and csv (from
      native tool) files
    * Handle merging multiple pairs of csv (from sdk tool) and csv (from
      native tool) files

Rocpd output format updates:
    * Merge multiple rocpd databases into a single csv
    * Reset dispatch id and kernel id for unique dispatches and unique
      kernels respectively
    * Retain multiple rocpd databases per run for multi process workloads

* Add test case for multiprocess profiling using rocflop workload

* Add rocflop

* Fix native counter csv to rocprofv3 csv conversion

* Use kernel_id instead of dispatch_id to correlate native counter csv
  and kernel trace csv

* python formatting using ruff 0.14 instead of 0.13
2025-12-23 13:12:18 -05:00
Corey Derochie f221a1ae08 Updated troubleshooting-rccl.rst to change rocm-smi to amd-smi (#2028)
* Updated troubleshooting-rccl.rst to change rocm-smi to amd-smi

* Added `amd-smi static --driver`

* Update docs/how-to/troubleshooting-rccl.rst

Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>

---------

Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: f942810959]
2025-12-23 21:22:11 +05:30
Corey Derochie f942810959 Updated troubleshooting-rccl.rst to change rocm-smi to amd-smi (#2028)
* Updated troubleshooting-rccl.rst to change rocm-smi to amd-smi

* Added `amd-smi static --driver`

* Update docs/how-to/troubleshooting-rccl.rst

Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>

---------

Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
2025-12-23 21:22:11 +05:30
Karthikeyan Arumugam bb599d8ed7 Add support for AMD AINIC within RCCL default internal network plugin. (#2078)
* Added support for AMD ROCm net-ib alongside vanilla net-ib, with auto-generation to detect conflicts early during NCCL sync and enable future customizations.
* Integrated AMD AINIC support in RCCL for out-of-the-box usage, leveraging performance improvements by default, channel pinning for optimal pipeline performance, and extended support for 32B in-line CTS messages.
* Implemented internal derivation of AINIC-specific flags when RCCL AINIC environment parameter is set, and checks before initializing AINIC net-ib methods.
* Included snapshot of auto-generated ROCm net-ib file (src/transport/net_ib_rocm.cc) for reference.
* Fixed typos in RCCL param API (RCCL_AINIC_ROCE) and dlclose.
* Updated plugin loading logic:
* Load internal ROCmIB plugin only when NCCL_NET_PLUGIN is not set.
* Load default internal net-ib only when not AINIC and no external plugin env is set.

[ROCm/rccl commit: 9f4651f20f]
2025-12-23 10:33:10 -05:00
Karthikeyan Arumugam 9f4651f20f Add support for AMD AINIC within RCCL default internal network plugin. (#2078)
* Added support for AMD ROCm net-ib alongside vanilla net-ib, with auto-generation to detect conflicts early during NCCL sync and enable future customizations.
* Integrated AMD AINIC support in RCCL for out-of-the-box usage, leveraging performance improvements by default, channel pinning for optimal pipeline performance, and extended support for 32B in-line CTS messages.
* Implemented internal derivation of AINIC-specific flags when RCCL AINIC environment parameter is set, and checks before initializing AINIC net-ib methods.
* Included snapshot of auto-generated ROCm net-ib file (src/transport/net_ib_rocm.cc) for reference.
* Fixed typos in RCCL param API (RCCL_AINIC_ROCE) and dlclose.
* Updated plugin loading logic:
* Load internal ROCmIB plugin only when NCCL_NET_PLUGIN is not set.
* Load default internal net-ib only when not AINIC and no external plugin env is set.
2025-12-23 10:33:10 -05:00
marandje 3e49440495 SWDEV-555178 - Calculate phys mem offset for remap range (#1879) 2025-12-23 10:27:42 +01:00
Milan Radosavljevic 719556fbba [rocprofiler-systems] Add SIGKILL delay option (#2384)
## Motivation

When profiling multi-process applications where a parent process sends SIGKILL to child processes, the termination can occur before the profiler has a chance to flush collected data. This PR introduces a configurable delay before SIGKILL signals are forwarded, allowing profiling data to be captured before process termination. This is workaround.

## Technical Details

- Added new configuration setting `ROCPROFSYS_KILL_DELAY` (default: 0 seconds) to specify a delay before SIGKILL signals are forwarded to other processes
- Implemented `kill_gotcha` component that intercepts the `kill()` system call
- The gotcha only delays SIGKILL signals sent to external processes (pid > 0 and not self)
- Integrated `kill_gotcha_t` into the `preinit_bundle_t` for early initialization
2025-12-22 21:17:57 -05:00
Young Hui - AMD 37e3b8a3db [rocpd] Write rocpd yaml files as a list, even when only 1 file (#2288) 2025-12-22 17:56:59 -05:00
Omri Mor 56bfb13644 QueuePair: prefix bnxt functions and variables (#373)
[ROCm/rocshmem commit: f5940f6b9a]
2025-12-22 14:46:17 -08:00
Omri Mor f5940f6b9a QueuePair: prefix bnxt functions and variables (#373) 2025-12-22 14:46:17 -08:00
Omri Mor c43dc136f3 [Bugfix] GDA/bnxt: release SQ lock before return (#372)
* bnxt_post_wqe_amo_single with fetching = true would return
  before releasing the send queue lock, resulting in a deadlock.
* Release the send queue lock before returning from the function.

[ROCm/rocshmem commit: 016e08120a]
2025-12-22 12:05:00 -08:00
Omri Mor 016e08120a [Bugfix] GDA/bnxt: release SQ lock before return (#372)
* bnxt_post_wqe_amo_single with fetching = true would return
  before releasing the send queue lock, resulting in a deadlock.
* Release the send queue lock before returning from the function.
2025-12-22 12:05:00 -08:00
habajpai-amd 447025011a [Rocprof-Sys] Resolve crash when profiling TensorFlow GPU application (#2381)
* fix: resolve crash when profiling TensorFlow GPU application

* incorporate review comments

* updated min_rows from 3 to 2 for threads table validation as internal threads are not profiled and are now correctly bypassed
2025-12-22 14:00:55 -05:00
Ammar ELWazir 1f8e8e3fbf Add CODEOWNERS for rocprofiler-sdk project (#2427)
## Motivation

Missing CODEOWNERS for ROCProfiler-SDK

<!-- Explain the purpose of this PR and the goals it aims to achieve. -->

## Technical Details

Add CODEOWNERS for rocprofiler-sdk project

<!-- Explain the changes along with any relevant GitHub links. -->

## JIRA ID

<!-- If applicable, mention the JIRA ID resolved by this PR (Example: Resolves SWDEV-12345). -->
<!-- Do not post any JIRA links here. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [X] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2025-12-22 12:16:09 -05:00
Gopesh Bhardwaj 9141f26905 [Documentaion] updating roctx library linkage documentation (#2251) 2025-12-22 10:36:13 -05:00
ammallya 0a52f5c101 Adding full build capability to theROCK for HIP changes (#2003)
## Add Full Build Capability to theROCK for HIP

### Summary
This PR adds full build support to **theROCK** for HIP-related changes, ensuring that all components are built.

### Changes
- Enabled full build coverage for the following projects:
  - `projects/clr`
  - `projects/hip`
  - `projects/hip-tests`
  - `projects/rocr-runtime`
- Updated build configuration to include all targets for the above projects.
- Ensured rocm-libraries is pulled to build optional components.

### Motivation
These changes are required to support HIP development and testing within theROCK by ensuring all components are built together. This improves reliability, integration testing.
2025-12-22 05:31:32 -08:00
marantic-amd ba1380a75d Put cached perfetto traces as default one (#2138)
* Put cached perfetto traces as default one

* Improve cached data and perfetto traces in order to be more aligned with E2E tests

* Addressing PR comments and findings

* Force early instrumentation bundle instantiation

* Sync-up insturumented containers with thread growth data

* Revert ompvv number of host threads to default 8

* Fixed counter track namings for amd-smi

* AIPROFSYST-34 [rocprof-sys] Update documentation describing newly introduced changes to default tracing mechanism
2025-12-22 12:47:35 +01:00
Aleksandar Djordjevic 7da3275b42 [rocprofiler-systems] Improve metadata parsing (#2238)
* Improve metadata JSON parsing
* Fix string ownership
2025-12-22 12:30:51 +01:00
Kutovoi, Vadim a4b99485a9 gda/ro: validate and exit cleanly when forced GDA config is invalid (#354)
* gda: validate and exit cleanly when forced GDA config is invalid

* ro: validate and exit cleanly when forced RO config is invalid

[ROCm/rocshmem commit: 80a710ac0a]
2025-12-22 10:54:33 +00:00
Kutovoi, Vadim 80a710ac0a gda/ro: validate and exit cleanly when forced GDA config is invalid (#354)
* gda: validate and exit cleanly when forced GDA config is invalid

* ro: validate and exit cleanly when forced RO config is invalid
2025-12-22 10:54:33 +00:00
abchoudh-amd 5b241f3e61 Fixed ctests (#2406) 2025-12-22 13:12:58 +05:30
Omri Mor ed38201b90 gda: fix incorrect casts from void* to uintptr_t (#369)
[ROCm/rocshmem commit: e8fc5e67c4]
2025-12-19 16:18:49 -08:00
Omri Mor e8fc5e67c4 gda: fix incorrect casts from void* to uintptr_t (#369) 2025-12-19 16:18:49 -08:00
Geo Min 3635953cd8 Revert "Adding org var and dynamic selection of targets (#2317)" (#2416)
This reverts commit c9ac018395.
2025-12-19 14:51:53 -08:00
cadolphe-amd 14c949a827 SWDEV-572676 - adjust tile size to 32 in Unit_hipCGThreadBlockTileType for Navi4x (#2379)
* SWDEV-572676 - adjust tile size to 32 for Navi4x

* SWDEV-572676 - change tile size from fixed value to warp size
2025-12-19 16:43:34 -05:00
Sourabh U Betigeri d552491985 SWDEV-572329 - Remove barrier packet (#2304) 2025-12-19 13:37:48 -08:00
Sourabh U Betigeri fdc1660dfa SWDEV-565304 - Pass numa node to migrate pages correctly (#1729)
* SWDEV-565304 - Pass cpuId of the the thread currently running

* SWDEV-565304 - Numa id to be returned

* SWDEV-565304 - Numa id to be returned
2025-12-19 13:36:53 -08:00
Geo Min c199df6b96 Revert "Adding org var and dynamic runner selection (#2106)" (#2114)
This reverts commit 4f7698c27e.

[ROCm/rccl commit: 4f474a7389]
2025-12-19 12:53:09 -08:00
Geo Min 4f474a7389 Revert "Adding org var and dynamic runner selection (#2106)" (#2114)
This reverts commit 2e193aed68.
2025-12-19 12:53:09 -08:00
Matt Arsenault 0c0d8dc974 SWDEV-548892 - Stop using __ockl_lane_id (#2186)
__lane_id already exists and is identical.
2025-12-19 20:34:55 +01:00
systems-assistant[bot] 7c989ac022 [SWDEV-525635] Updated output file handling options (#1896)
Currently if the input file name already exists, the tool
appends output to existing file. Added overwrite, append,
or no(discard) options to choose from.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Co-authored-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-12-19 13:10:42 -06:00
Dimple Prajapati e21c087f2a [BugFix] Fix rocshmem_get_device_ctx to return ctx_opaque pointer (#359)
Changed rocshmem_get_device_ctx() to properly copy the full rocshmem_ctx_t
structure and return only the ctx_opaque pointer instead of trying to
copy directly to a void pointer.

Prior implementation would cause undefined behavior or memory corruption
as it was copying 16 bytes of data to 8 bytes. It worked so far beucase ctx_opaque
field is at proper offsest, but incorrectly memcpy would overwrite some other allocations
and cause issues.
This fixes the context memory handling when passing device context from
host to device kernels.

[ROCm/rocshmem commit: cf6a53e81c]
2025-12-19 10:01:02 -05:00
Dimple Prajapati cf6a53e81c [BugFix] Fix rocshmem_get_device_ctx to return ctx_opaque pointer (#359)
Changed rocshmem_get_device_ctx() to properly copy the full rocshmem_ctx_t
structure and return only the ctx_opaque pointer instead of trying to
copy directly to a void pointer.

Prior implementation would cause undefined behavior or memory corruption
as it was copying 16 bytes of data to 8 bytes. It worked so far beucase ctx_opaque
field is at proper offsest, but incorrectly memcpy would overwrite some other allocations
and cause issues.
This fixes the context memory handling when passing device context from
host to device kernels.
2025-12-19 10:01:02 -05:00
Aurelien Bouteiller dde4902844 Fix driver.sh script for system where neither amd-smi or rocm-smi are (#370)
found

[ROCm/rocshmem commit: 5eaa152010]
2025-12-19 10:00:11 -05:00
Aurelien Bouteiller 5eaa152010 Fix driver.sh script for system where neither amd-smi or rocm-smi are (#370)
found
2025-12-19 10:00:11 -05:00
dependabot[bot] 750d3f8b2e Bump urllib3 from 2.5.0 to 2.6.0 in /docs/sphinx (#365)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.5.0 to 2.6.0.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.5.0...2.6.0)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rocshmem commit: 166a591216]
2025-12-19 09:55:42 -05:00
dependabot[bot] 166a591216 Bump urllib3 from 2.5.0 to 2.6.0 in /docs/sphinx (#365)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.5.0 to 2.6.0.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.5.0...2.6.0)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-19 09:55:42 -05:00
habajpai-amd 7b00d3a89b fix: prevent double-free crash during process exit in amd-smi (#2213) 2025-12-19 11:56:40 +05:30
Sourabh U Betigeri 883fdfb820 Revert "clr: Minor fixes for error return" (#2399)
- This reverts commit 8dd8436e43c7f0d062fd73252bf61c35615d181d.
- Resolve MIOpen test failures observed in TheRock
- TheRock Issue: ROCm/TheRock#2642
- room-systems issue: #2400
2025-12-18 18:40:13 -05:00
alexander-sannikov 8bc2e81e9a Tuning: use constant value for CorrectionFactor tables
[ROCm/rccl commit: 50568dc93d]
2025-12-18 18:55:03 +00:00
alexander-sannikov 50568dc93d Tuning: use constant value for CorrectionFactor tables 2025-12-18 18:55:03 +00:00