76333 Υποβολές

Συγγραφέας SHA1 Μήνυμα Ημερομηνία
David Galiffi 38a81ac4e3 Update VERSION to 1.3.0 (#1368) 2025-10-15 23:12:10 -04:00
David Galiffi b75423b173 Update installation and ROCPD documentation (#1300)
* Updating install doc page

* Removing the Quick Start page

* Add documentation for rocpd output

* Update links to reference rocm-systems repo

* Update README.md

Installation instructions references ROCm Docs link.

* Updated git clone instructions

Back to using https to clone the repository

* Fix formatting

* Update projects/rocprofiler-systems/docs/how-to/understanding-rocprof-sys-output.rst

* Add reference to "rocpd" section to the "Profiling Python" section

* Update CONTRIBUTING.md

* For ROCPD, document minimum version of SDK.

* Update CHANGELOGS

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Update CHANGELOG.md

Updated based on feedback from docs team

* Update CONTRIBUTING.md

* Update CONTRIBUTING.md.

Simplify and remove setup information overlapping with the "rocm-systems" contributing documentation.

* Apply suggestion from @prbasyal-amd

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Apply suggestion from @prbasyal-amd

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Update CHANGELOG.md

* Apply suggestion from @prbasyal-amd

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Apply suggestion from @prbasyal-amd

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Apply suggestion from @prbasyal-amd

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Apply suggestion from @prbasyal-amd

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Apply suggestion from @prbasyal-amd

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Apply suggestion from @prbasyal-amd

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Apply suggestion from @prbasyal-amd

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Apply suggestion from @prbasyal-amd

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Apply suggestion from @prbasyal-amd

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
2025-10-15 23:11:46 -04:00
Dimple Prajapati 6c4325d131 Add host API for enqueuing barrier on given stream (#274)
* add host API for enqueuing barrier on given stream

[ROCm/rocshmem commit: a44b581997]
2025-10-15 14:29:07 -07:00
Dimple Prajapati a44b581997 Add host API for enqueuing barrier on given stream (#274)
* add host API for enqueuing barrier on given stream
2025-10-15 14:29:07 -07:00
Young Hui - AMD 02bf0a8492 [rocprofiler-compute] Source files updated to reference super-repo URL (#1330)
* source files updated to reference super-repo URL
2025-10-15 15:35:11 -04:00
Young Hui - AMD 161e44c425 [rocprof-compute] Documentation changes for move to super-repo for 7.1 (#1329)
- also remove json output mention in docs
2025-10-15 15:32:54 -04:00
vedithal-amd 454e935448 Fix docker compose (#1323)
Co-authored-by: Yanyao Wang <yanywang@amd.com>
2025-10-15 14:26:56 -05:00
vedithal-amd ecf0d32644 Update CHANGELOG.md for ROCm 7.1.0 release (#1362) 2025-10-15 14:25:34 -05:00
Alysa Liu 4342579645 libhsakmt: Fix memory leak for events_page metadata (#807) 2025-10-15 14:52:40 -04:00
Alysa Liu d5cbdc104d rocrtst: Add Memory_Async_Copy_On_Engine Test (#885)
Increase test coverage involving:
hsa_amd_memory_get_preferred_copy_engine()
hsa_amd_memory_copy_engine_status()
hsa_amd_memory_async_copy_on_engine()
2025-10-15 14:51:54 -04:00
alex-breslow-amd a5256e6219 MSCCL: Unland PR1788 + Fix for MSCCL Data Corruption (#1960)
- Earlier fix PR1788 is no longer necessary after ROCr fix and pre-ROCr fix workaround
- Inserts an s_waitcnt vmcnt(0), which fixes a data corruption issue in MSCCL

[ROCm/rccl commit: 154350baaf]
2025-10-15 10:32:25 -07:00
alex-breslow-amd 154350baaf MSCCL: Unland PR1788 + Fix for MSCCL Data Corruption (#1960)
- Earlier fix PR1788 is no longer necessary after ROCr fix and pre-ROCr fix workaround
- Inserts an s_waitcnt vmcnt(0), which fixes a data corruption issue in MSCCL
2025-10-15 10:32:25 -07:00
Saurabh Verma 31a7f3d5dd Update gfx9_primitives.h and gfx9_block_table.h to use gc_9_4_2_offset.h (#859)
* Initial commit

* Replaced gc_9_2_1_sh_mask.h with gc_9_4_2_sh_mask.h

* properly replace gc_9_2_1_sh_mask.h and gc_9_2_1_offset.h for all gfx9 asics
2025-10-15 12:13:35 -05:00
gilbertlee-amd bb85692891 Enabling gdrcopy option for gfx950 (#1955)
[ROCm/rccl commit: fedddb452c]
2025-10-15 10:55:25 -06:00
gilbertlee-amd fedddb452c Enabling gdrcopy option for gfx950 (#1955) 2025-10-15 10:55:25 -06:00
Venkateshwar Reddy Kandula 9404178ea5 [rocprofiler-sdk][CI] rhel sles workflow fix (#1373)
* bug fix.

* add backslash

* add export for path, bug
2025-10-15 11:48:59 -05:00
alex-breslow-amd 455d516dc4 [gfx950] Make bypassing __threadfence the default for multinode. (#1947)
* Gate based on ROCM version, safe for ROCm 7.0.2 and beyond.
* Updates naming to gfx9CheapFenceOff since we use this for gfx942 and gfx950.  Thanks Nilesh.
* Add info logging statement to NCCL_INIT to print whether enabled when INFO logging is enabled.

[ROCm/rccl commit: c70f5b4621]
2025-10-15 09:15:36 -07:00
alex-breslow-amd c70f5b4621 [gfx950] Make bypassing __threadfence the default for multinode. (#1947)
* Gate based on ROCM version, safe for ROCm 7.0.2 and beyond.
* Updates naming to gfx9CheapFenceOff since we use this for gfx942 and gfx950.  Thanks Nilesh.
* Add info logging statement to NCCL_INIT to print whether enabled when INFO logging is enabled.
2025-10-15 09:15:36 -07:00
adapryor a64e9b4ac4 [SWDEV-560778] Update gpu metrics factory to return a new pointer every time 2025-10-15 11:00:44 -05:00
adapryor cda730140f [SWDEV-560778] Update gpu metrics factory to return a new pointer every time
[ROCm/amdsmi commit: a64e9b4ac4]
2025-10-15 11:00:44 -05:00
Mythreya Kuricheti ac8adbacff [CI][rocprofiler-sdk] Fix codeql jobs (#1366) 2025-10-15 10:34:29 -05:00
Saurabh Verma 946385d0ff Reverts #1379 and properly migrates the docs (#1381)
Reverts #1379 and properly migrates the docs

---------

Co-authored-by: Matt Williams <matt.williams@amd.com>
2025-10-15 10:48:27 -04:00
Saurabh Verma b6a187aed1 migrate aqlprofile docs 7.0.1 from standalone repo (#1379)
This PR migrates the aqlprofile/docs folder from standalone repo to monorepo
Link to the docs branch:
https://github.com/ROCm/aqlprofile/commits/docs/7.0.1

---------

Co-authored-by: Matt Williams <matt.williams@amd.com>
Co-authored-by: pbhandar-amd <138039281+pbhandar-amd@users.noreply.github.com>
2025-10-15 10:01:36 -04:00
Gerardo Hernandez fc5551a724 SWDEV-536360 - fix another bullet point in reduce sync operations section not being displayed on its own line (#1374) 2025-10-15 14:51:43 +01:00
Danylo Lytovchenko 59a30bb117 Add ignore revs file (#1126)
* Add ignore revs file

* Fix rev file name
2025-10-15 13:57:56 +02:00
ajanicijamd 259ef6348b Fixed issues with nic-performance test (#1168)
- On some hosts the wget can finish too soon and PAPI doesn't catch even a single network event.
- On some hosts, there are multiple default NICs and the scripts didn't work in that case.
- The test script was writing the output of wget to /tmp directory, which causes a problem if another user tries to run the same test. Because the output file with the same name already exists in the same directory, but with a different owner, the test fails

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-10-14 23:45:08 -04:00
Gerardo Hernandez bfbc48bb0e SWDEV-536360 - fix bullet points in reduce sync operations section not being displayed on different lines in the browser (#1346) 2025-10-14 22:02:34 +01:00
axie_amdeng dde482d224 rocr: unitialized size variable caused huge memory/space allocation (#1232)
Signed-off-by: Alex Xie <AlexBin.Xie@amd.com>
2025-10-14 16:57:10 -04:00
Mythreya Kuricheti 765d9026c7 [CI][rocprofiler-sdk] Workflow improvements (#1341) 2025-10-14 15:21:55 -05:00
Ajay GunaShekar 0ac37de373 SWDEV-555665 - fix hip-tests for windows (#1028)
* SWDEV-555665 -enable fixed windows tests
2025-10-14 08:39:49 -07:00
isaki001 6d151d4e21 gfx950 channel tuning for ReduceScatter and AllGather (#1940)
* add channel thresholds to override channel-count adjustments

[ROCm/rccl commit: 0f99fd84a3]
2025-10-14 09:50:44 -05:00
isaki001 0f99fd84a3 gfx950 channel tuning for ReduceScatter and AllGather (#1940)
* add channel thresholds to override channel-count adjustments
2025-10-14 09:50:44 -05:00
Satyanvesh Dittakavi 9d32badcb7 SWDEV-545950 - Update indentation in hip_prof_str.h for hipStreamCopyAttributes (#1352) 2025-10-14 17:35:17 +05:30
Ioannis Assiouras 538ebc5409 SWDEV-556877 - Ensure pinned memory is released if hsa copy fails (#1137) 2025-10-14 10:08:49 +01:00
amd-srinivas1 092279449e SWDEV-546345-[catch2][dtest]-Tests for hipMemSetD2DXX Apis(Memory management) (#896)
* SWDEV-546345-Added tests for memsetd2dxx apis

* SWDEV-546345-Optimized the code.

* SWDEV-546345-Optimized the code.

* SWDEV-546345-Addressed review comments

* SWDEV-546345-Updated code.
2025-10-14 10:47:59 +05:30
SaleelK cc18890fe8 clr: Reset barrier_value_packet_ at init (#1162) 2025-10-13 22:01:46 -07:00
Wenkai Du 75a69211a0 Add all_reduce_bias_perf to support All Reduce with Bias (#130)
Use dynamic symbol loading of ncclAllReduceWithBias

Co-authored-by: mberenjk <146776561+mberenjk@users.noreply.github.com>

[ROCm/rccl-tests commit: db6ea5a594]
2025-10-13 16:09:10 -05:00
Wenkai Du db6ea5a594 Add all_reduce_bias_perf to support All Reduce with Bias (#130)
Use dynamic symbol loading of ncclAllReduceWithBias

Co-authored-by: mberenjk <146776561+mberenjk@users.noreply.github.com>
2025-10-13 16:09:10 -05:00
mberenjk 433251272b fixing the ar_with_bias test issue when running rccl-tests (#1912)
* fixing the AR_With_Bias issue when running rccl-tests

[ROCm/rccl commit: e738c03e39]
2025-10-13 13:58:21 -07:00
mberenjk e738c03e39 fixing the ar_with_bias test issue when running rccl-tests (#1912)
* fixing the AR_With_Bias issue when running rccl-tests
2025-10-13 13:58:21 -07:00
alex-breslow-amd d51ed2fdfd Dump compiler-determined GPU kernel resource usage (#1965)
Adds --kernel-resource-use flag to install.sh to allow dumping per-GPU kernel resource use at compile time (e.g., VGPRs, LDS, SGPRs, scratch, etc.)


[ROCm/rccl commit: ff209e5b19]
2025-10-13 11:24:42 -05:00
alex-breslow-amd ff209e5b19 Dump compiler-determined GPU kernel resource usage (#1965)
Adds --kernel-resource-use flag to install.sh to allow dumping per-GPU kernel resource use at compile time (e.g., VGPRs, LDS, SGPRs, scratch, etc.)
2025-10-13 11:24:42 -05:00
vstojilj f964f45902 SWDEV-553920 - Disable and fix failing tests (#1133) 2025-10-13 16:38:27 +02:00
vstojilj bfedf63575 SWDEV-552537 - Fix nvidia build failures (#1125)
* SWDEV-552537 - Fix nvidia build failures

* Add string header to fix hip-tests

---------

Co-authored-by: Branislav Brzak <branislav.brzak@amd.com>
Co-authored-by: Danylo Lytovchenko <danylo.lytovchenko@amd.com>
2025-10-13 09:20:17 +02:00
amd-srinivas1 b86b676514 SWDEV-553447-[catch2][dtest]-Add hipDeviceMallocUncached to hipMemCreate and hipMemMap flags (#857)
* SWDEV-547367-Updated tests to work with hipMemAllocationTypeUncached

* SWDEV-553447-Updated tests of hipMemMap

* SWDEV-553447-Resolved merge conflicts

---------

Co-authored-by: jainprad <92369414+jainprad@users.noreply.github.com>
2025-10-12 22:05:02 +05:30
Satyanvesh Dittakavi 46e683d41a SWDEV-545950 - Add hipStreamCopyAttributes API Implementation (#914)
* SWDEV-545950 - Add hipStreamCopyAttributes API Implementation

* Add unit test for hipStreamCopyAttributes API

* Add ChangeLog and nvidia mapping for the API

* Update rocprofiler-sdk with new HIP API details

* [rocprofiler-sdk] handle hipStreamCopyAttributes in stream tracing service

- this new HIP function has multiple stream arguments and needs to be skipped because it does not have an explicit create/destroy/set functionality

* Update HIP_RUNTIME_API_TABLE_STEP_VERSION in clr and rocprofiler-sdk

* Resolve merge conflicts

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-10-12 19:57:05 +05:30
Geo Min 3ead4ca4a1 fixing group id (#1975)
[ROCm/rccl commit: 97f2665da2]
2025-10-10 16:40:44 -07:00
Geo Min 97f2665da2 fixing group id (#1975) 2025-10-10 16:40:44 -07:00
David Yat Sin 7f79d0febc rocr: Set signal memory allocations to NonPaged (#1219)
Set memory allocation to non-paged to avoid issues caused when CP tries
to access signals after page has been migrated.
2025-10-10 17:35:15 -04:00
Mythreya Kuricheti 24a62a2ab3 [rocprofiler-sdk] Add codeowner for api-trace.h (#1933)
[ROCm/rccl commit: 3000f0e837]
2025-10-10 16:29:17 -05:00