76333 커밋

작성자 SHA1 메시지 날짜
Mythreya Kuricheti 3e5749ea59 [rocprofiler-sdk][CI] Update codeql rocm version (#1909)
* [rocprofiler-sdk][CI] Update codeql rocm version

* Build HIP from source
2025-12-02 12:17:59 -06:00
Wenkai Du 3e650467fa Use one side stream per process (#2063)
* Use one side stream per process

* Handle multiple GPUs per process

* Reset stream when not found

* Address review comments

* Fix missing mutex initializer

[ROCm/rccl commit: 185e78a8f0]
2025-12-02 10:03:15 -08:00
Wenkai Du 185e78a8f0 Use one side stream per process (#2063)
* Use one side stream per process

* Handle multiple GPUs per process

* Reset stream when not found

* Address review comments

* Fix missing mutex initializer
2025-12-02 10:03:15 -08:00
Yiltan 0f32739b52 Updated important missing enviroment variables (#344)
[ROCm/rocshmem commit: 8b350a51fe]
2025-12-02 11:40:30 -05:00
Yiltan 8b350a51fe Updated important missing enviroment variables (#344) 2025-12-02 11:40:30 -05:00
Aurelien Bouteiller 1e3a161c74 MLX5 cards have a vendor-id that does not match the pci-vendor-id for (#342)
some reason.

Signed-off-by: Aurelien Bouteiller <abouteil@amd.com>

[ROCm/rocshmem commit: 0f7da76018]
2025-12-02 11:32:37 -05:00
Aurelien Bouteiller 0f7da76018 MLX5 cards have a vendor-id that does not match the pci-vendor-id for (#342)
some reason.

Signed-off-by: Aurelien Bouteiller <abouteil@amd.com>
2025-12-02 11:32:37 -05:00
Jin Jung 3e25decb39 SWDEV-565719 - Enable GL Interop Tests on Windows (#1808)
* SWDEV-565719 - Enable GL Interop Tests on Windows

* SWDEV-565719 - Define USE_EGL feature

* Update projects/hip-tests/catch/unit/gl_interop/gl_interop_common.hh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* SWDEV-565719 - Add glewInit

* SWDEV-565719 - Remove incorrect glutExit()

* Refactor GL interop build configuration

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
2025-12-02 08:19:07 -08:00
Alysa Liu 3a7b5571c0 kfdtest: Replace pthread with std::thread (#1448)
* kfdtest: Replace pthread with std::thread

Modify concurrent kfdtest to use std::thread
instead of pthread, eventually modify KFDTestLaunch
to take in a member function of test instance
instead of static function.

Convert KFDQMTest to pass in member function for
multi-gpu kfdtest.

* kfdtest: Convert KFDPerfCountersTest to use std::thread

Convert KFDPerfCountersTest to use std::thread for
multi-gpu kfdtest.

* kfdtest: Convert KFDGraphicsInterop to use std::thread

Convert KFDGraphicsInterop to use std::thread for
multi-gpu kfdtest.

* kfdtest: Convert KFDGWSTest to use std::thread

Convert KFDGWSTest to use std::thread for
multi-gpu kfdtest.

* kfdtest: Convert KFDCWSRTest to use std::thread

Convert KFDCWSRTest to use std::thread for
multi-gpu kfdtest.

* kfdtest: Convert KFDEventTest to use std::thread

Convert KFDEventTest to use std::thread for
multi-gpu kfdtest.

* kfdtest: Convert KFDExceptionTest to use std::thread

Convert KFDExceptionTest to use std::thread for
multi-gpu kfdtest.

* kfdtest: Convert KFDLocalMemoryTest to use std::thread

Convert KFDLocalMemoryTest to use std::thread for
multi-gpu kfdtest.

* kfdtest: Convert KFDMemoryTest to use std::thread

Convert KFDMemoryTest to use std::thread for
multi-gpu kfdtest.

* kfdtest: Convert KFDSVMRangeTest to use std::thread

Convert KFDSVMRangeTest to use std::thread for
multi-gpu kfdtest.

* kfdtest: Convert KFDHWSTest to use std::thread

Convert KFDHWSTest to use std::thread for
multi-gpu kfdtest.

* kfdtest: Remove pthread multigpu test structure

Remove older multi-gpu test framework which
uses pthread.
2025-12-02 10:25:21 -05:00
Alysa Liu 81df45d896 rocrtst: Add test for filter ROCR_VISIBLE_DEVICES (#2016)
Improve test coverage for amd_filter_device.cpp.
2025-12-02 10:15:03 -05:00
vedithal-amd 5bbb72f516 Add JIRA ID to pull request template (#2099) 2025-12-02 10:14:27 -05:00
lancesix 7ba0f1bdcf ROCr-runtime: Allow core dumps for gfx115x (#1827)
Core dumps are not supporetd for gfx110x, but should be possible for
gfx115x.  The current code disables core dumps completly for all gfx11xx
agents, relax this to allow gfx115x.
2025-12-02 11:51:29 +00:00
Jaydeep cf91f5f77a SWDEV-568926 - Destroy queue_inactive_signal directly. (#2059) 2025-12-02 09:55:58 +05:30
Ioannis Assiouras 65b769ee16 SWDEV-569101 - increase signal list size to at least DEBUG_HIP_GRAPH_BATCH_SIZE (#2084) 2025-12-01 18:52:51 -08:00
SaleelK c105dcd05b clr: Use graph segment scheduling to process HIP Graphs (#1372)
* clr: Use graph segment scheduling to process HIP Graphs

* Add a broader path to use capture packet capture for all topologies
* Refactor code
* Use DEBUG_HIP_GRAPH_SEGMENT_SCHEDULING to toggle new vs classic path,
  Enabled by default

* clr: Few fixes and improvements

* clr: Detect complex graphs to take classic path

* Use DEBUG_HIP_GRAPH_SEGMENT_SCHEDULING=2 to force segment scheduling
  path

* clr: Fix a cornercase stack corruption

* clr: Track commands of segments instead of snapshots

* clr: Fix Batch dispatch logic

* Track fence_dirty_ flag for command of other streams
* Dependency resolution markers can now accomodate dirty fence on cross
  streams

---------

Co-authored-by: Ioannis Assiouras <Ioannis.Assiouras@amd.com>
Co-authored-by: Godavarthy Surya, Anusha <agodavar@amd.com>
2025-12-01 12:49:26 -08:00
Kutovoi, Vadim dde9e2464e gda: add check for active interfaces when selecting the GDA backend (#327)
* gda: add check for active interfaces when selecting the GDA backend

* fix __func__ maco in rocshmem_ctx_pe_quiet

* gda: switch to more generic RDMA NIC term in has_active_ib_interface

* gda: add active MLX5 and Pensando vendor ID checks for backend selection

[ROCm/rocshmem commit: 29000a5644]
2025-12-01 15:49:25 -05:00
Kutovoi, Vadim 29000a5644 gda: add check for active interfaces when selecting the GDA backend (#327)
* gda: add check for active interfaces when selecting the GDA backend

* fix __func__ maco in rocshmem_ctx_pe_quiet

* gda: switch to more generic RDMA NIC term in has_active_ib_interface

* gda: add active MLX5 and Pensando vendor ID checks for backend selection
2025-12-01 15:49:25 -05:00
Bindhiya Kanangot Balakrishnan a627c12501 [SWDEV-566465] Fix json output for amdsmi reset (#2043)
Fixed json output for reset command.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-12-01 13:30:32 -06:00
Adel Johar 2c243feb1b [Docs] Move environment variables to separate page (#341)
[ROCm/rocshmem commit: ba77bdd9a6]
2025-12-01 14:25:27 -05:00
Adel Johar ba77bdd9a6 [Docs] Move environment variables to separate page (#341) 2025-12-01 14:25:27 -05:00
dependabot[bot] 2f96618210 Bump rocm-docs-core[api_reference] from 1.30.0 to 1.30.1 in /docs/sphinx (#209)
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.30.0 to 1.30.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.30.0...v1.30.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.30.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rocjpeg commit: a1f8e1e3b5]
2025-12-01 09:56:25 -08:00
dependabot[bot] a1f8e1e3b5 Bump rocm-docs-core[api_reference] from 1.30.0 to 1.30.1 in /docs/sphinx (#209)
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.30.0 to 1.30.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.30.0...v1.30.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.30.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-01 09:56:25 -08:00
cfallows-amd 29a7591791 Update RHEL8/9 workflow with latest rocm 7.1.1 links (#2060)
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
2025-12-01 11:14:33 -05:00
Yiltan 77ba8cc76c Cleanup readme.md
[ROCm/rocshmem commit: 774159a08f]
2025-12-01 10:43:18 -05:00
Yiltan 774159a08f Cleanup readme.md 2025-12-01 10:43:18 -05:00
marantic-amd 3b11e01716 Perfetto traces from cached data (#1704)
## Motivation

The idea is to unify the way and place where we store our traces. Current implementation uses `trace_cache` for rocpd traces, but perfetto is in lined inside of each module. This change allows us to have a single point in code where we will collect data, process it and store it in the desired format. This means that we can declutter the code further and have single point of responsibility and single point of failure.

## Technical Details

New `processor` (perfetto_post_processing.cpp) is added to the `trace_cache` which purpose is to use the cached data to populate perfetto tracks. Cache manager is responsible for keeping the instance of this processor and for its lifetime.
2025-12-01 09:59:16 -05:00
Kian Cossettini b506c75f28 [rocprof-sys] Fix roctx wall clock tree, change timemory push/pop to use proper category, and add roctx as valid domain choice (#2062)
When doing this ticket, I also noticed the program would SEGFAULT when ROCPROFSYS_ROCM_DOMAINS=roctx even though the docs tell us we can do this. Went ahead and fixed that.

Also noticed that timemory push/pop in rocprofiler-sdk.cpp was always using category::rocm_marker_api instead of CategoryT. Fixed that as well.
2025-12-01 09:50:58 -05:00
Yiltan 2079193495 Update docs for GDA (#337)
[ROCm/rocshmem commit: 5606fdafd6]
2025-12-01 09:38:11 -05:00
Yiltan 5606fdafd6 Update docs for GDA (#337) 2025-12-01 09:38:11 -05:00
Kian Cossettini ae29018bb0 [rocprofiler-systems] Enable HOST OMPVV runtime-instrumentation CTests (#1970)
* Enable HOST ompvv runtime-instrumentation ctests

* Fix rocprofiler-systems-avail-regex-negation test failure

* Exclude problematic function from instrumentation

* Make push pop skip an env option for ctests

* Remove SKIP_PUSH_POP_CHECK from argument parse

Co-authored-by: David Galiffi <David.Galiffi@amd.com>

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-12-01 09:26:24 -05:00
vstojilj 77f58ceb9f SWDEV-558557 - Remove duplicate nodes when capturing hipMemcpyAsync (#1226) 2025-12-01 11:25:13 +01:00
Horatio Zhang 5983ccefa5 librocdxg: Return DXG fd in amdgpu_device_get_fd
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Reviewed-by: Flora Cui <flora.cui@amd.com>
2025-12-01 14:29:20 +08:00
Horatio Zhang 6bb933f820 librocdxg: Add drm metadata related interface
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Reviewed-by: Flora Cui <flora.cui@amd.com>
2025-12-01 14:29:13 +08:00
Milan Radosavljevic ee7305e795 [rocprof-sys] Add test cleanup fixtures for binary-rewrite and runtime-instrument tests (#2012)
- Added `binary-rewrite-cleanup` and `runtime-instrument-cleanup` tests that remove instrumented binaries and output directories using `cmake -E rm -rf`
- Implemented CMake test fixtures (`FIXTURES_SETUP` and `FIXTURES_CLEANUP`) to establish proper test ordering:
  - `binary-rewrite` sets up the `binary-rewrite-fixture`
  - `binary-rewrite-run` and validation tests require this fixture
  - `binary-rewrite-cleanup` performs cleanup for this fixture
  - Same pattern applied for `runtime-instrument`
- Extended `ROCPROFILER_SYSTEMS_ADD_PYTHON_TEST` to accept `FIXTURES_REQUIRED` parameter
- Updated validation tests to require appropriate cleanup fixtures based on test name pattern matching
- Added fixture requirements to Python code-coverage tests
2025-11-28 18:51:54 -05:00
abchoudh-amd fd61b0f507 Add CU Utilization and deprecate Active CUs (#1822)
* ChangeLog

* Deprecation notice in old arch

* Deprecation notice current arch

* New config hash

* Added Config deltas

* Added metric description
2025-11-28 11:32:25 -05:00
vedithal-amd 3f2fbc18e9 [rocprofiler-compute] Only depend on amdsmi in profile phase (#2044)
* Only depepnd on amdsmi in profile phase

* amdsmi interface tests should have common prefix for easier testing
2025-11-28 11:32:00 -05:00
Flora Cui 0761dd0146 librocdxg: Increase AQL frame size calculation
to prevent PM4 command buffer overflow

Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Longlong Yao <Longlong.Yao@amd.com>
Part-of: <http://10.67.69.192/wsl/rocr-runtime/-/merge_requests/113>
2025-11-28 14:53:07 +08:00
Flora Cui 437e4b092e librocdxg: Convert all CmdUtil methods to static
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Longlong Yao <Longlong.Yao@amd.com>
Part-of: <http://10.67.69.192/wsl/rocr-runtime/-/merge_requests/113>
2025-11-28 14:52:56 +08:00
Honglei Huang 68c8e111ae rocr/format: Fix clang-format-diff.py path in format script (#1757)
Update the format script to use absolute path for clang-format-diff.py
instead of relative path. This ensures the script works correctly
regardless of the current working directory when executed.

- Change from './clang-format-diff.py' to '${root}/projects/rocr-runtime/clang-format-diff.py'
- Improves script reliability and portability

Signed-off-by: Honglei Huang <honghuan@amd.com>
2025-11-28 09:21:12 +08:00
Honglei Huang aaa06e1609 libhsakmt/virtio: add non SVM mode in libhsakmt virtio driver and many fixes (#1756)
* libhsakmt/virtio: change shmem size to 80

Some DGPU props have a lot of information,
so it is necessary to increase the size of shmem.

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: use BO handle instead of pointer in memory registration

Change vhsakmt_map_to_gpu() return type from void* to vhsakmt_bo_handle
to properly handle buffer object information. This allows access to
both the host address and resource ID needed for memory registration.

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Improve memory mapping logic

- Update vhsakmt_mappable() to check NoAddress flag and require HostAccess
- Remove mappable checks in cpu_map/unmap to allow all BOs to be mapped
- Set BO flags properly in vhsakmt_alloc_memory and scratch memory creation
- Ensure scratch memory is correctly flagged for proper handling

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: add no svm mode for libhsakmt virtio

Add no svm mode for libhsakmt virtio driver, in no svm mode userptrs
need UMD to manage, so add interval tree to manage them.

New Features:
- Add augmented red-black tree based interval tree implementation
  * Implement RB-tree insertion, deletion, and color balancing
  * Provide interval query for fast overlapping range lookup
  * Based on Linux kernel's augmented rbtree implementation

- Improve userptr memory management
  * Use interval tree to efficiently track userptr memory regions
  * Support finding registered memory within given address ranges
  * Optimize memory mapping and unmapping performance

Signed-off-by: Honglei Huang <honghuan@amd.com>

---------

Signed-off-by: Honglei Huang <honghuan@amd.com>
2025-11-28 09:20:43 +08:00
Pratik Basyal 792ecc1a83 Formatting fixed (#1691) 2025-11-27 18:55:45 -05:00
corey-derochie-amd 8e3f60e080 Add copyright to src/device/symmetric/all_reduce.cuh (#2080)
[ROCm/rccl commit: 4acd0f64ea]
2025-11-27 14:29:21 -07:00
corey-derochie-amd 4acd0f64ea Add copyright to src/device/symmetric/all_reduce.cuh (#2080) 2025-11-27 14:29:21 -07:00
Giovanni Lenzi Baraldi 0e04fdd571 Workaround for SWDEV-559598. Enabling more thread trace tests. (#1336)
* Workaround for SWDEV-559598

* gfx11 fix
2025-11-27 20:03:38 +01:00
vstojilj 259010f2d5 SWDEV-491253 - Create stream capture test for kernel APIs (#1189) 2025-11-27 17:40:11 +01:00
vstojilj 1c09c87cc7 SWDEV-564927 - Allow sizeBytes to be 0 when hipMemsetAsync is captured (#1849) 2025-11-27 17:13:33 +01:00
Kian Cossettini 63713f01e0 [rocprofiler-systems] Add Fortran MPI CTests (#1172)
* Add MPI CTests (use gfortran)

* Add proper regex check

* Skip Runtime-Instrument due to incompatibility with MPI

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Sajina Kandy <sputhala@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-27 10:32:09 -05:00
vedithal-amd 2e10041210 Fix sL1D values in memory chart (#2037) 2025-11-27 09:13:19 -05:00
Godavarthy Surya, Anusha 2e1c37a926 SWDEV-490861 - Remove recursion and extra loop in hipGraphLaunch (#1792) 2025-11-27 10:25:08 +00:00
Ammar ELWazir ed42157c31 Fixing Code Object Data Race and Thread Safety & Adding validation test (#2014) 2025-11-26 20:28:52 -06:00