David Galiffi
38a81ac4e3
Update VERSION to 1.3.0 ( #1368 )
2025-10-15 23:12:10 -04:00
David Galiffi
b75423b173
Update installation and ROCPD documentation ( #1300 )
...
* Updating install doc page
* Removing the Quick Start page
* Add documentation for rocpd output
* Update links to reference rocm-systems repo
* Update README.md
Installation instructions references ROCm Docs link.
* Updated git clone instructions
Back to using https to clone the repository
* Fix formatting
* Update projects/rocprofiler-systems/docs/how-to/understanding-rocprof-sys-output.rst
* Add reference to "rocpd" section to the "Profiling Python" section
* Update CONTRIBUTING.md
* For ROCPD, document minimum version of SDK.
* Update CHANGELOGS
Signed-off-by: David Galiffi <David.Galiffi@amd.com >
* Update CHANGELOG.md
Updated based on feedback from docs team
* Update CONTRIBUTING.md
* Update CONTRIBUTING.md.
Simplify and remove setup information overlapping with the "rocm-systems" contributing documentation.
* Apply suggestion from @prbasyal-amd
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com >
* Apply suggestion from @prbasyal-amd
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com >
* Update CHANGELOG.md
* Apply suggestion from @prbasyal-amd
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com >
* Apply suggestion from @prbasyal-amd
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com >
* Apply suggestion from @prbasyal-amd
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com >
* Apply suggestion from @prbasyal-amd
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com >
* Apply suggestion from @prbasyal-amd
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com >
* Apply suggestion from @prbasyal-amd
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com >
* Apply suggestion from @prbasyal-amd
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com >
* Apply suggestion from @prbasyal-amd
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com >
* Apply suggestion from @prbasyal-amd
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com >
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com >
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com >
2025-10-15 23:11:46 -04:00
Dimple Prajapati
6c4325d131
Add host API for enqueuing barrier on given stream ( #274 )
...
* add host API for enqueuing barrier on given stream
[ROCm/rocshmem commit: a44b581997 ]
2025-10-15 14:29:07 -07:00
Dimple Prajapati
a44b581997
Add host API for enqueuing barrier on given stream ( #274 )
...
* add host API for enqueuing barrier on given stream
2025-10-15 14:29:07 -07:00
Young Hui - AMD
02bf0a8492
[rocprofiler-compute] Source files updated to reference super-repo URL ( #1330 )
...
* source files updated to reference super-repo URL
2025-10-15 15:35:11 -04:00
Young Hui - AMD
161e44c425
[rocprof-compute] Documentation changes for move to super-repo for 7.1 ( #1329 )
...
- also remove json output mention in docs
2025-10-15 15:32:54 -04:00
vedithal-amd
454e935448
Fix docker compose ( #1323 )
...
Co-authored-by: Yanyao Wang <yanywang@amd.com >
2025-10-15 14:26:56 -05:00
vedithal-amd
ecf0d32644
Update CHANGELOG.md for ROCm 7.1.0 release ( #1362 )
2025-10-15 14:25:34 -05:00
Alysa Liu
4342579645
libhsakmt: Fix memory leak for events_page metadata ( #807 )
2025-10-15 14:52:40 -04:00
Alysa Liu
d5cbdc104d
rocrtst: Add Memory_Async_Copy_On_Engine Test ( #885 )
...
Increase test coverage involving:
hsa_amd_memory_get_preferred_copy_engine()
hsa_amd_memory_copy_engine_status()
hsa_amd_memory_async_copy_on_engine()
2025-10-15 14:51:54 -04:00
alex-breslow-amd
a5256e6219
MSCCL: Unland PR1788 + Fix for MSCCL Data Corruption ( #1960 )
...
- Earlier fix PR1788 is no longer necessary after ROCr fix and pre-ROCr fix workaround
- Inserts an s_waitcnt vmcnt(0), which fixes a data corruption issue in MSCCL
[ROCm/rccl commit: 154350baaf ]
2025-10-15 10:32:25 -07:00
alex-breslow-amd
154350baaf
MSCCL: Unland PR1788 + Fix for MSCCL Data Corruption ( #1960 )
...
- Earlier fix PR1788 is no longer necessary after ROCr fix and pre-ROCr fix workaround
- Inserts an s_waitcnt vmcnt(0), which fixes a data corruption issue in MSCCL
2025-10-15 10:32:25 -07:00
Saurabh Verma
31a7f3d5dd
Update gfx9_primitives.h and gfx9_block_table.h to use gc_9_4_2_offset.h ( #859 )
...
* Initial commit
* Replaced gc_9_2_1_sh_mask.h with gc_9_4_2_sh_mask.h
* properly replace gc_9_2_1_sh_mask.h and gc_9_2_1_offset.h for all gfx9 asics
2025-10-15 12:13:35 -05:00
gilbertlee-amd
bb85692891
Enabling gdrcopy option for gfx950 ( #1955 )
...
[ROCm/rccl commit: fedddb452c ]
2025-10-15 10:55:25 -06:00
gilbertlee-amd
fedddb452c
Enabling gdrcopy option for gfx950 ( #1955 )
2025-10-15 10:55:25 -06:00
Venkateshwar Reddy Kandula
9404178ea5
[rocprofiler-sdk][CI] rhel sles workflow fix ( #1373 )
...
* bug fix.
* add backslash
* add export for path, bug
2025-10-15 11:48:59 -05:00
alex-breslow-amd
455d516dc4
[gfx950] Make bypassing __threadfence the default for multinode. ( #1947 )
...
* Gate based on ROCM version, safe for ROCm 7.0.2 and beyond.
* Updates naming to gfx9CheapFenceOff since we use this for gfx942 and gfx950. Thanks Nilesh.
* Add info logging statement to NCCL_INIT to print whether enabled when INFO logging is enabled.
[ROCm/rccl commit: c70f5b4621 ]
2025-10-15 09:15:36 -07:00
alex-breslow-amd
c70f5b4621
[gfx950] Make bypassing __threadfence the default for multinode. ( #1947 )
...
* Gate based on ROCM version, safe for ROCm 7.0.2 and beyond.
* Updates naming to gfx9CheapFenceOff since we use this for gfx942 and gfx950. Thanks Nilesh.
* Add info logging statement to NCCL_INIT to print whether enabled when INFO logging is enabled.
2025-10-15 09:15:36 -07:00
adapryor
a64e9b4ac4
[SWDEV-560778] Update gpu metrics factory to return a new pointer every time
2025-10-15 11:00:44 -05:00
adapryor
cda730140f
[SWDEV-560778] Update gpu metrics factory to return a new pointer every time
...
[ROCm/amdsmi commit: a64e9b4ac4 ]
2025-10-15 11:00:44 -05:00
Mythreya Kuricheti
ac8adbacff
[CI][rocprofiler-sdk] Fix codeql jobs ( #1366 )
2025-10-15 10:34:29 -05:00
Saurabh Verma
946385d0ff
Reverts #1379 and properly migrates the docs ( #1381 )
...
Reverts #1379 and properly migrates the docs
---------
Co-authored-by: Matt Williams <matt.williams@amd.com >
2025-10-15 10:48:27 -04:00
Saurabh Verma
b6a187aed1
migrate aqlprofile docs 7.0.1 from standalone repo ( #1379 )
...
This PR migrates the aqlprofile/docs folder from standalone repo to monorepo
Link to the docs branch:
https://github.com/ROCm/aqlprofile/commits/docs/7.0.1
---------
Co-authored-by: Matt Williams <matt.williams@amd.com >
Co-authored-by: pbhandar-amd <138039281+pbhandar-amd@users.noreply.github.com >
2025-10-15 10:01:36 -04:00
Gerardo Hernandez
fc5551a724
SWDEV-536360 - fix another bullet point in reduce sync operations section not being displayed on its own line ( #1374 )
2025-10-15 14:51:43 +01:00
Danylo Lytovchenko
59a30bb117
Add ignore revs file ( #1126 )
...
* Add ignore revs file
* Fix rev file name
2025-10-15 13:57:56 +02:00
ajanicijamd
259ef6348b
Fixed issues with nic-performance test ( #1168 )
...
- On some hosts the wget can finish too soon and PAPI doesn't catch even a single network event.
- On some hosts, there are multiple default NICs and the scripts didn't work in that case.
- The test script was writing the output of wget to /tmp directory, which causes a problem if another user tries to run the same test. Because the output file with the same name already exists in the same directory, but with a different owner, the test fails
---------
Co-authored-by: David Galiffi <David.Galiffi@amd.com >
2025-10-14 23:45:08 -04:00
Gerardo Hernandez
bfbc48bb0e
SWDEV-536360 - fix bullet points in reduce sync operations section not being displayed on different lines in the browser ( #1346 )
2025-10-14 22:02:34 +01:00
axie_amdeng
dde482d224
rocr: unitialized size variable caused huge memory/space allocation ( #1232 )
...
Signed-off-by: Alex Xie <AlexBin.Xie@amd.com >
2025-10-14 16:57:10 -04:00
Mythreya Kuricheti
765d9026c7
[CI][rocprofiler-sdk] Workflow improvements ( #1341 )
2025-10-14 15:21:55 -05:00
Ajay GunaShekar
0ac37de373
SWDEV-555665 - fix hip-tests for windows ( #1028 )
...
* SWDEV-555665 -enable fixed windows tests
2025-10-14 08:39:49 -07:00
isaki001
6d151d4e21
gfx950 channel tuning for ReduceScatter and AllGather ( #1940 )
...
* add channel thresholds to override channel-count adjustments
[ROCm/rccl commit: 0f99fd84a3 ]
2025-10-14 09:50:44 -05:00
isaki001
0f99fd84a3
gfx950 channel tuning for ReduceScatter and AllGather ( #1940 )
...
* add channel thresholds to override channel-count adjustments
2025-10-14 09:50:44 -05:00
Satyanvesh Dittakavi
9d32badcb7
SWDEV-545950 - Update indentation in hip_prof_str.h for hipStreamCopyAttributes ( #1352 )
2025-10-14 17:35:17 +05:30
Ioannis Assiouras
538ebc5409
SWDEV-556877 - Ensure pinned memory is released if hsa copy fails ( #1137 )
2025-10-14 10:08:49 +01:00
amd-srinivas1
092279449e
SWDEV-546345-[catch2][dtest]-Tests for hipMemSetD2DXX Apis(Memory management) ( #896 )
...
* SWDEV-546345-Added tests for memsetd2dxx apis
* SWDEV-546345-Optimized the code.
* SWDEV-546345-Optimized the code.
* SWDEV-546345-Addressed review comments
* SWDEV-546345-Updated code.
2025-10-14 10:47:59 +05:30
SaleelK
cc18890fe8
clr: Reset barrier_value_packet_ at init ( #1162 )
2025-10-13 22:01:46 -07:00
Wenkai Du
75a69211a0
Add all_reduce_bias_perf to support All Reduce with Bias ( #130 )
...
Use dynamic symbol loading of ncclAllReduceWithBias
Co-authored-by: mberenjk <146776561+mberenjk@users.noreply.github.com >
[ROCm/rccl-tests commit: db6ea5a594 ]
2025-10-13 16:09:10 -05:00
Wenkai Du
db6ea5a594
Add all_reduce_bias_perf to support All Reduce with Bias ( #130 )
...
Use dynamic symbol loading of ncclAllReduceWithBias
Co-authored-by: mberenjk <146776561+mberenjk@users.noreply.github.com >
2025-10-13 16:09:10 -05:00
mberenjk
433251272b
fixing the ar_with_bias test issue when running rccl-tests ( #1912 )
...
* fixing the AR_With_Bias issue when running rccl-tests
[ROCm/rccl commit: e738c03e39 ]
2025-10-13 13:58:21 -07:00
mberenjk
e738c03e39
fixing the ar_with_bias test issue when running rccl-tests ( #1912 )
...
* fixing the AR_With_Bias issue when running rccl-tests
2025-10-13 13:58:21 -07:00
alex-breslow-amd
d51ed2fdfd
Dump compiler-determined GPU kernel resource usage ( #1965 )
...
Adds --kernel-resource-use flag to install.sh to allow dumping per-GPU kernel resource use at compile time (e.g., VGPRs, LDS, SGPRs, scratch, etc.)
[ROCm/rccl commit: ff209e5b19 ]
2025-10-13 11:24:42 -05:00
alex-breslow-amd
ff209e5b19
Dump compiler-determined GPU kernel resource usage ( #1965 )
...
Adds --kernel-resource-use flag to install.sh to allow dumping per-GPU kernel resource use at compile time (e.g., VGPRs, LDS, SGPRs, scratch, etc.)
2025-10-13 11:24:42 -05:00
vstojilj
f964f45902
SWDEV-553920 - Disable and fix failing tests ( #1133 )
2025-10-13 16:38:27 +02:00
vstojilj
bfedf63575
SWDEV-552537 - Fix nvidia build failures ( #1125 )
...
* SWDEV-552537 - Fix nvidia build failures
* Add string header to fix hip-tests
---------
Co-authored-by: Branislav Brzak <branislav.brzak@amd.com >
Co-authored-by: Danylo Lytovchenko <danylo.lytovchenko@amd.com >
2025-10-13 09:20:17 +02:00
amd-srinivas1
b86b676514
SWDEV-553447-[catch2][dtest]-Add hipDeviceMallocUncached to hipMemCreate and hipMemMap flags ( #857 )
...
* SWDEV-547367-Updated tests to work with hipMemAllocationTypeUncached
* SWDEV-553447-Updated tests of hipMemMap
* SWDEV-553447-Resolved merge conflicts
---------
Co-authored-by: jainprad <92369414+jainprad@users.noreply.github.com >
2025-10-12 22:05:02 +05:30
Satyanvesh Dittakavi
46e683d41a
SWDEV-545950 - Add hipStreamCopyAttributes API Implementation ( #914 )
...
* SWDEV-545950 - Add hipStreamCopyAttributes API Implementation
* Add unit test for hipStreamCopyAttributes API
* Add ChangeLog and nvidia mapping for the API
* Update rocprofiler-sdk with new HIP API details
* [rocprofiler-sdk] handle hipStreamCopyAttributes in stream tracing service
- this new HIP function has multiple stream arguments and needs to be skipped because it does not have an explicit create/destroy/set functionality
* Update HIP_RUNTIME_API_TABLE_STEP_VERSION in clr and rocprofiler-sdk
* Resolve merge conflicts
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com >
2025-10-12 19:57:05 +05:30
Geo Min
3ead4ca4a1
fixing group id ( #1975 )
...
[ROCm/rccl commit: 97f2665da2 ]
2025-10-10 16:40:44 -07:00
Geo Min
97f2665da2
fixing group id ( #1975 )
2025-10-10 16:40:44 -07:00
David Yat Sin
7f79d0febc
rocr: Set signal memory allocations to NonPaged ( #1219 )
...
Set memory allocation to non-paged to avoid issues caused when CP tries
to access signals after page has been migrated.
2025-10-10 17:35:15 -04:00
Mythreya Kuricheti
24a62a2ab3
[rocprofiler-sdk] Add codeowner for api-trace.h ( #1933 )
...
[ROCm/rccl commit: 3000f0e837 ]
2025-10-10 16:29:17 -05:00