76333 Melakukan

Penulis SHA1 Pesan Tanggal
vedithal-amd bb5fd1d4ae [rocprofiler-compute] Update analysis db for visualizer integration (#1548)
* Analysis db changes for visualizer

* Add support for per kernel analysis metrics

* Add support for dispatch timeline visualiztion

* Show median instead of mean of dispatch duration in kernel view

* Add test case to validate analysis db schema

* Analysis db schema updte
    * Add Kernel table and make Metric and Dispatch table its children
    * Kernel table is a child of Workload table
    * Update metric_view to show kernel_name column
    * Add disptach timestamps to Dispatch table for dispatch timeline
      visualization
    * Update kernel_view to show duration_ns_median instead of mean
      duration

* Add mean duation in kernel view

* update changelog

---------

Co-authored-by: Fei Zheng <44449748+feizheng10@users.noreply.github.com>
2025-11-03 09:25:12 -05:00
vedithal-amd dbb361c606 [rocprofiler-compute] fix parser to prevent missing metrics in analysis mode (#1613)
* fix parser

* fix parser

* fix parser

---------

Co-authored-by: fei.zheng <fei.zheng@amd.com>
Co-authored-by: ywang103-amd <ywang103@amd.com>
2025-11-03 09:23:22 -05:00
Ahmed Khan acf64be514 Remove duplicate MaxEmptyLinesToKeep from clang-format (#2016)
[ROCm/rccl commit: caffd013f6]
2025-11-02 21:44:27 -06:00
Ahmed Khan caffd013f6 Remove duplicate MaxEmptyLinesToKeep from clang-format (#2016) 2025-11-02 21:44:27 -06:00
Victor Zhang 437ce0b8df fix atomics SystemTest() use after free (#1595) 2025-11-02 21:45:44 -05:00
arvindcheru fb1d32c15c SWDEV-530465 Update share/doc/<pkgnm> License Folder for hsa-rocr (#923)
* SWDEV-530465 Update share/doc/<pkgnm> License Folder for hsa-rocr
* Review Comments Updated - reverted to usage of DOCDIR
2025-10-31 23:21:22 -04:00
Jeff Jiang 9f857d54f0 Added logging control (#667)
* * rocDecode: Added logging control
 - Message output from the core components is now controlled by the logging level, which can be set by an environment variable or other methods.

* * rocDecode/Logging control: Fixed a typo.

* * rocDecode/Logging control: Removed reference to the logger class from RocVideoDecoder utility, which results in build error on non-source install environment.

* * rocDecode/Logging control: Improved some wording in the docs.

[ROCm/rocdecode commit: 60e6c585ff]
2025-10-31 20:50:33 -04:00
Jeff Jiang 60e6c585ff Added logging control (#667)
* * rocDecode: Added logging control
 - Message output from the core components is now controlled by the logging level, which can be set by an environment variable or other methods.

* * rocDecode/Logging control: Fixed a typo.

* * rocDecode/Logging control: Removed reference to the logger class from RocVideoDecoder utility, which results in build error on non-source install environment.

* * rocDecode/Logging control: Improved some wording in the docs.
2025-10-31 20:50:33 -04:00
Aurelien Bouteiller e622398337 install_dependencies pip issues with ubuntu 24 (#302)
* The install_dependencies script would fail on ubuntu 24.04
they changed how pip works so we need to create a venv first now

* Fix install_dependencies for ubuntu 22

* Make sure we build in the builddir and install in the installdir
combine installdir for ucx and ompi when user-provided by INSTALL_DIR
retain prior behavior if not overridden to avoid breaking CI scripts

[ROCm/rocshmem commit: e155af8704]
2025-10-31 16:34:36 -04:00
Aurelien Bouteiller e155af8704 install_dependencies pip issues with ubuntu 24 (#302)
* The install_dependencies script would fail on ubuntu 24.04
they changed how pip works so we need to create a venv first now

* Fix install_dependencies for ubuntu 22

* Make sure we build in the builddir and install in the installdir
combine installdir for ucx and ompi when user-provided by INSTALL_DIR
retain prior behavior if not overridden to avoid breaking CI scripts
2025-10-31 16:34:36 -04:00
lmoriche f5bbb09c0d clr: SWDEV-547890 - Maintain an MQD for the emulated AQL queue (#1316)
* clr: SWDEV-547890 - Maintain an MQD for the emulated AQL queue

To simplify the shader debugger implementation, maintain the relevant
parts of the emulated AQL queue's MQD (amd_queue_t): read_dispatch_id,
write_dispatch_id, compute_tmpring_size.

With this MQD, the shader debugger can handle the emulated AQL queue
the same way it does the real AQL queue, no specialization is required.

* clr: SWDEV-547890 - Conservatively update the MQD's read_dispatch_id

The read_dispatch_id cannot be smaller than the current aql_packet_id
- hsa_queue.size for the debugger to work correctly.

The read_dispatch_id really should be updated when the CmdBuf is marked
as complete. Left a FIXME to address it in a future commit.
2025-10-31 16:07:02 -04:00
Satyanvesh Dittakavi f332888366 SWDEV-560304 - Fix segfault with invalid stream (#1360) 2025-11-01 00:04:44 +05:30
David Galiffi 5850d5b973 Updating documentation (#1602)
* Update rocprof-sys-feature-set.rst

* Update configuring-runtime-options.rst
2025-10-31 14:30:25 -04:00
Jaydeep 10763f0e7a SWDEV-559505 - Enable back memset optimization and handle the cases when setParam can change the number of AQL packets for memset graph node. (#1320)
Co-authored-by: jaydeeppatel1111 <jaypatel@amd.com>
2025-10-31 22:49:14 +05:30
Ossian O'Reilly b9de7baaa9 Update README.md (#1611)
* Update README.md

Add missing directory in git sparse-checkout instructions

* Update README.md typo

---------

Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
2025-10-31 13:16:09 -04:00
Nilesh M Negi dd625edf56 Revert "[GEN/BUILD] Refactor generate.py and reduce build time for older archs (#2006)" (#2021)
This reverts commit 40f3faead0.

[ROCm/rccl commit: 62ab7a22d7]
2025-10-31 10:04:12 -05:00
Nilesh M Negi 62ab7a22d7 Revert "[GEN/BUILD] Refactor generate.py and reduce build time for older archs (#2006)" (#2021)
This reverts commit bed7cdf863.
2025-10-31 10:04:12 -05:00
David DeBonis 3e750f0f57 Single-node AllGather and ReduceScatter Optimization (#2019)
* Single-node performance tuning

* Normalizing value to individual rank

[ROCm/rccl commit: 63d5846452]
2025-10-31 08:59:46 -06:00
David DeBonis 63d5846452 Single-node AllGather and ReduceScatter Optimization (#2019)
* Single-node performance tuning

* Normalizing value to individual rank
2025-10-31 08:59:46 -06:00
Yiltan 3535ce8c0a Alltoall linear parallel Optimization (#303)
[ROCm/rocshmem commit: 8dd2112ec8]
2025-10-31 10:26:44 -04:00
Yiltan 8dd2112ec8 Alltoall linear parallel Optimization (#303) 2025-10-31 10:26:44 -04:00
Yiltan 2f8a1c02a4 [GDA] Implement internal_direct_barrier_wg (#299)
[ROCm/rocshmem commit: 5f87bb061b]
2025-10-31 10:26:24 -04:00
Yiltan 5f87bb061b [GDA] Implement internal_direct_barrier_wg (#299) 2025-10-31 10:26:24 -04:00
Yiannis Papadopoulos 37bbc9062a rocr/aie: Detect AIE architecture and marketing name (#1459)
* rocr/aie: Detect AIE architecture and marketing name

* rocr/aie: Modernize code, update comments
2025-10-31 09:10:18 -05:00
Yiannis Papadopoulos 82d68fc772 rocrtst: Assume that AIE agent memory is system RAM (#1231) 2025-10-31 09:10:00 -05:00
Kian Cossettini 883caf2719 [rocprofiler-systems] Overhaul skip condition of implicit_task and add ROCPD validation test (#1589)
- Add rocpd validation check and fix implicit_task check
- SWDEV-562896
2025-10-31 09:59:23 -04:00
Ioannis Assiouras 1dd0237cb2 SWDEV-563752 - Allow hipMemLocationTypeHost in hipMemSetAccess even if memory was created on the device (#1620)
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2025-10-31 13:57:36 +00:00
Arm Patinyasakdikul 54194a17c3 Added ERROR message class to handle fatal error messages. (#2002)
* Added ERROR message class to handle fatal error messages.

New ERROR message class will print the message in all debug level,
including none.

Change some of the fatal error message to be in ERROR instead of WARN.

Added new error handler function to print out more meaningful error
message in the future.

* Added CHANGELOG entry.

* Update CHANGELOG.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Change to no longer reuse NONE as ERROR. ERROR is now a separated class.

* Update CHANGELOG.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

[ROCm/rccl commit: 1ce83d5cc0]
2025-10-30 16:14:20 -05:00
Arm Patinyasakdikul 1ce83d5cc0 Added ERROR message class to handle fatal error messages. (#2002)
* Added ERROR message class to handle fatal error messages.

New ERROR message class will print the message in all debug level,
including none.

Change some of the fatal error message to be in ERROR instead of WARN.

Added new error handler function to print out more meaningful error
message in the future.

* Added CHANGELOG entry.

* Update CHANGELOG.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Change to no longer reuse NONE as ERROR. ERROR is now a separated class.

* Update CHANGELOG.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
2025-10-30 16:14:20 -05:00
ywang103-amd 24cb8c4deb fix crashs related to metric generator and add copy right (#1608)
* fix crash created by path and arg for pc_sampling  and add copyright for mat_mul

* resolve fomat issue of line too long

* bugfixes

* copy gfx9 config template to analysis config in src

---------

Co-authored-by: Wang <ywang103@ctr2-alola-login-01.amd.com>
Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
2025-10-30 16:36:56 -04:00
gabrpham_amdeng 9739611239 Fixed Namspace has no attribute 'pcie' error in set command
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
2025-10-30 15:17:48 -05:00
gabrpham_amdeng 4d29ff8a2d Fixed Namspace has no attribute 'pcie' error in set command
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>


[ROCm/amdsmi commit: 9739611239]
2025-10-30 15:17:48 -05:00
adapryor 4abb69f9d9 Fix evicted_time 2025-10-30 14:01:44 -05:00
adapryor 5c95a1485f Fix evicted_time
[ROCm/amdsmi commit: 4abb69f9d9]
2025-10-30 14:01:44 -05:00
Arm Patinyasakdikul 03e92dc942 Added copyrights for Palamida scan 7.2. (#2018)
[ROCm/rccl commit: 84fdcab68a]
2025-10-30 13:33:20 -05:00
Arm Patinyasakdikul 84fdcab68a Added copyrights for Palamida scan 7.2. (#2018) 2025-10-30 13:33:20 -05:00
isaki001 9bccbcd619 P2p batching hang-fix (#2011)
* prevent batching when send/recv bytes dont match, restore bit reversal for channel to part mapping, prevent batching beyond 32-nodes

* correct computation for channel to part mapping

* update changelog

* disabling p2p-batching by default

[ROCm/rccl commit: 641c0eb51c]
2025-10-30 13:32:01 -05:00
isaki001 641c0eb51c P2p batching hang-fix (#2011)
* prevent batching when send/recv bytes dont match, restore bit reversal for channel to part mapping, prevent batching beyond 32-nodes

* correct computation for channel to part mapping

* update changelog

* disabling p2p-batching by default
2025-10-30 13:32:01 -05:00
Dmitrii a2cff3c84d [RDC] Fix GPU_COUNT metric to only count GPUs (#1453)
* [RDC] Fix GPU_COUNT metric to only count GPUs
* [RDC] Clean up float->double casts

---------

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-10-30 12:50:47 -05:00
Galantsev, Dmitrii a375479386 Use system gtest instead of building from source
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-10-30 12:38:11 -05:00
Galantsev, Dmitrii adaf3c9966 Use system gtest instead of building from source
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: a375479386]
2025-10-30 12:38:11 -05:00
Galantsev, Dmitrii ad20d57162 Find libamd_smi.so and librocm-core.so relative to wrapper.py
Allow amdsmi to find libamd_smi.so and librocm-core.so relative to
amdsmi_wrapper.py location.

The amdsmi_wrapper.py file is located in
_rocm_sdk_core/share/amd_smi/amdsmi and the libraries are in
_rocm_sdk_core/lib/libamd_smi.so.26.
_rocm_sdk_core/lib/librocm-core.so.1.
2025-10-30 12:35:06 -05:00
Galantsev, Dmitrii 55f999f3ce Find libamd_smi.so and librocm-core.so relative to wrapper.py
Allow amdsmi to find libamd_smi.so and librocm-core.so relative to
amdsmi_wrapper.py location.

The amdsmi_wrapper.py file is located in
_rocm_sdk_core/share/amd_smi/amdsmi and the libraries are in
_rocm_sdk_core/lib/libamd_smi.so.26.
_rocm_sdk_core/lib/librocm-core.so.1.


[ROCm/amdsmi commit: ad20d57162]
2025-10-30 12:35:06 -05:00
isaki001 678366f5e2 gx950 multi-node tuning for LL/LL128 (#1953)
* increased LL threshold for gfx950 AR to 256KB

* AG/RS proto threshold update

[ROCm/rccl commit: 72996e4d9f]
2025-10-30 12:08:12 -05:00
isaki001 72996e4d9f gx950 multi-node tuning for LL/LL128 (#1953)
* increased LL threshold for gfx950 AR to 256KB

* AG/RS proto threshold update
2025-10-30 12:08:12 -05:00
Allen Hubbe fa7841f0d4 functional_tests: n, nskip, nloop, nlarge options (#297)
To make the functional tests more useful for benchmarking, allow user to
specify the number of loops and related parameters via command options.

Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>

[ROCm/rocshmem commit: ed91c8cce2]
2025-10-30 11:54:49 -04:00
Allen Hubbe ed91c8cce2 functional_tests: n, nskip, nloop, nlarge options (#297)
To make the functional tests more useful for benchmarking, allow user to
specify the number of loops and related parameters via command options.

Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
2025-10-30 11:54:49 -04:00
Bindhiya Kanangot Balakrishnan a2aae5e8a9 [SWDEV-558046] Fix topology weight corruption due to casting
The out of bound writes caused corruption in next field,
which was weight. Fixed by reading to a temp and then assigning
safely.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-10-30 10:49:38 -05:00
Bindhiya Kanangot Balakrishnan 9973a6b324 [SWDEV-558046] Fix topology weight corruption due to casting
The out of bound writes caused corruption in next field,
which was weight. Fixed by reading to a temp and then assigning
safely.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>


[ROCm/amdsmi commit: a2aae5e8a9]
2025-10-30 10:49:38 -05:00
Bertan Dogancay 40f3faead0 [GEN/BUILD] Refactor generate.py and reduce build time for older archs (#2006)
[ROCm/rccl commit: bed7cdf863]
2025-10-30 11:45:53 -04:00