Replicating https://github.com/ROCm/TheRock/pull/2147#discussion_r2528008441
## Motivation
Fixes https://github.com/ROCm/TheRock/issues/875 which is the issue where Windows builds would fail randomly when uploading to s3 with the `SignatureDoesNotMatch` error as a result of special characters existing in the AWS Access Keys generated by the `configure-aws-credentials` action that is passed through Windows environment variables to `aws-cli`. More details below.
## Technical Details
https://github.com/ROCm/TheRock/issues/875#issuecomment-3530851762
In summary, in Windows workflows, the `special-characters-workaround` option is set to true for the `configure-aws-credentials` action which will regenerate access keys until there are no special characters that may not be passable through windows environment variables correctly.
## Test Plan
Observe CI.
## Test Result
TBD.
## Submission Checklist
- [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
* rocr: Fix exception on AsyncEventControl init
Fix exception on init when compiling with in release mode.
* rocr: Fix crash when interrupts are disabled
Fix segfault due to assert for signal->EopEvent() being false when
HSA_ENABLE_INTERRUPT=0. Use Signal::WaitMultiple(..) when interrupt is
disabled.
---------
Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
* SWDEV-533237 Added test cases for hipOccupancyAvailableDynamicSMemPerBlock API
* SWDEV-533237 : Added test cases for hipOccupancyAvailableDynamicSMemPerBlock
* SWDEV-533237 : Addressed review comments for hipOccupancyAvailableDynamicSMemPerBlock aip test cases
---------
Co-authored-by: jainprad <92369414+jainprad@users.noreply.github.com>
Handled numa data - including cpu and socket list, bitmask,
and affinity for csv format.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Handled numa data - including cpu and socket list, bitmask,
and affinity for csv format.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/amdsmi commit: 1b027d15bd]
* Added Python & C API's for new node devices. Currently these are functional for node 0 only.
- amdsmi_get_node_handle
- amdsmi_get_npm_info
* Added `amd-smi node` CLI for Node Power Management
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
* Added Python & C API's for new node devices. Currently these are functional for node 0 only.
- amdsmi_get_node_handle
- amdsmi_get_npm_info
* Added `amd-smi node` CLI for Node Power Management
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: f8e4771363]
* Forward ctest labels from the execution test to the validation test.
* Adjust test validation parameters for amid_smi samples
The actual number of samples will vary depending on the GPU. This test
is just to validate the presence of the samples
Changes:
- Simplified reset calls
- Updated static limit N/A values to all possible data
(helps csv format be consistent)
- Unit format was broken on static
- get_power_cap() had min/max values swapped, and the return
was missing two fields
- Updated changelog to reflect all changes
Change-Id: I23713471b984f52085372486c6e6ff852e2f42f8
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Changes:
- Simplified reset calls
- Updated static limit N/A values to all possible data
(helps csv format be consistent)
- Unit format was broken on static
- get_power_cap() had min/max values swapped, and the return
was missing two fields
- Updated changelog to reflect all changes
Change-Id: I23713471b984f52085372486c6e6ff852e2f42f8
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: 00a893d299]
* for multinode gfx950, extend AR LL128 up to 256MB, extend RS LL128 up to 8MB per rank, extend AG LL up to 64KB per rank
* dont override direct allgather threshold if set to -1
* restore 2-node AR simple at earlier message sizes than higher multi-node AR
* extend range of LL for single-node RS on gfx950
* update algo/proto for multi-node allreduce on gfx942
* set single-node AR on gfx950 to Tree LL for KB message sizes
* decrease threshold for single node Tree for gfx950 AR
[ROCm/rccl commit: 0d09f86608]
* for multinode gfx950, extend AR LL128 up to 256MB, extend RS LL128 up to 8MB per rank, extend AG LL up to 64KB per rank
* dont override direct allgather threshold if set to -1
* restore 2-node AR simple at earlier message sizes than higher multi-node AR
* extend range of LL for single-node RS on gfx950
* update algo/proto for multi-node allreduce on gfx942
* set single-node AR on gfx950 to Tree LL for KB message sizes
* decrease threshold for single node Tree for gfx950 AR
- Updated python integration test to account for PPT1 support changes
- Updated set/reset power-cap input format
- Adjusted python API and updated C++ API test
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Change-Id: Ia9d02868b6e91c88c10a9772d9e2d9f37c3c352f
- Updated python integration test to account for PPT1 support changes
- Updated set/reset power-cap input format
- Adjusted python API and updated C++ API test
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Change-Id: Ia9d02868b6e91c88c10a9772d9e2d9f37c3c352f
[ROCm/amdsmi commit: 18faddf6f3]
The argparse 'choices' parameter was receiving a comma-separated string
instead of a list, causing it to treat individual characters as valid
choices rather than complete tokens like 'SPX', 'DPX', etc.
Fixed by removing the unnecessary join() operation in
get_accelerator_choices_types_indices() to return the list directly.
This matches the pattern used by get_memory_partition_types().
Now 'amd-smi set -C DPX' and other partition commands work correctly.
The argparse 'choices' parameter was receiving a comma-separated string
instead of a list, causing it to treat individual characters as valid
choices rather than complete tokens like 'SPX', 'DPX', etc.
Fixed by removing the unnecessary join() operation in
get_accelerator_choices_types_indices() to return the list directly.
This matches the pattern used by get_memory_partition_types().
Now 'amd-smi set -C DPX' and other partition commands work correctly.
[ROCm/amdsmi commit: f6b1cb9024]
The test verifies that all shared memory objects for
IPC events used internally by HIP are properly cleaned
up after use and do not leave persistent files in /dev/shm.
Co-authored-by: Ioannis Assiouras <Ioannis.Assiouras@amd.com>