Use fork/waitpid to isolate API call and detect SIGKILL from kernel
Signed-off-by: Sumanth Gavini <sumanth.gavini@amd.com>
[ROCm/amdsmi commit: a044536b8d]
* Added Product Serial Number to the raw_bytes cper entries
* Added Product Serial Number to the Python API return
---------
Signed-off-by: Saeed, Oosman <Oosman.Saeed@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
* Added Product Serial Number to the raw_bytes cper entries
* Added Product Serial Number to the Python API return
---------
Signed-off-by: Saeed, Oosman <Oosman.Saeed@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: 05ea00dcc4]
* [SWDEV-563828] Fix incorrect help text for --perf-determinism flag to indicate it expects GFXCLK frequency in MHz
---------
Signed-off-by: Billakanti, Koushik <Koushik.Billakanti@amd.com>
* [SWDEV-563828] Fix incorrect help text for --perf-determinism flag to indicate it expects GFXCLK frequency in MHz
---------
Signed-off-by: Billakanti, Koushik <Koushik.Billakanti@amd.com>
[ROCm/amdsmi commit: 23f68555db]
* Add XGMI and PCIe metrics to the profiling data
Add support for AMD XGMI (GPU-to-GPU interconnect) and PCIe
metrics:
* XGMI link width in bits
* XGMI link speed in GT/s
* Per-link read bandwidth (KB)
* Per-link write bandwidth (KB)
- Add new categories for PCIe metrics:
* PCIe link width
* PCIe link speed in GT/s
* Accumulated bandwidth (MB)
* Instantaneous bandwidth (MB/s)
* Fix VCN/JPEG insert logic
* Modify the gpu_metrics struct to accomodate XCP structure
* Add ctest automation for gpu interconnect metrics
* Refactor to move gpu_metrics struct and serialization to another file
* Possible fix for timeout in CI
Fix redundant skip check in ctest
Add xgmi and pcie option in rocprof-sys-avail.
* Change2: Address review comments
Change ctest sampling to avoid timeout
Change variable name and code structuring
* Add option in ctest to run rocprof-sys-run without rewrite
Run transferbench with rocprof-sys-run without sampling
* Change3: Fix sample insert bug and address review comments
xgmi and pci support check
renaming variables
additional hip_api validation in rocpd
* Reduce the load from the trnasferBench sample
The CI builds were timing out when flushing a big temporary file to the
DB: (2720824.23 KB / 2720.82 MB / 2.72 GB)...
Added check for GCC versions prior to 9.0 and
link with libstdc++fs when needed. This fixes
undefined symbols on older systems like Deb10
with GCC 8.3.0.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Added check for GCC versions prior to 9.0 and
link with libstdc++fs when needed. This fixes
undefined symbols on older systems like Deb10
with GCC 8.3.0.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
[ROCm/amdsmi commit: e1b3d5f02e]
Replicating https://github.com/ROCm/TheRock/pull/2147#discussion_r2528008441
## Motivation
Fixes https://github.com/ROCm/TheRock/issues/875 which is the issue where Windows builds would fail randomly when uploading to s3 with the `SignatureDoesNotMatch` error as a result of special characters existing in the AWS Access Keys generated by the `configure-aws-credentials` action that is passed through Windows environment variables to `aws-cli`. More details below.
## Technical Details
https://github.com/ROCm/TheRock/issues/875#issuecomment-3530851762
In summary, in Windows workflows, the `special-characters-workaround` option is set to true for the `configure-aws-credentials` action which will regenerate access keys until there are no special characters that may not be passable through windows environment variables correctly.
## Test Plan
Observe CI.
## Test Result
TBD.
## Submission Checklist
- [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
* rocr: Fix exception on AsyncEventControl init
Fix exception on init when compiling with in release mode.
* rocr: Fix crash when interrupts are disabled
Fix segfault due to assert for signal->EopEvent() being false when
HSA_ENABLE_INTERRUPT=0. Use Signal::WaitMultiple(..) when interrupt is
disabled.
---------
Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
* SWDEV-533237 Added test cases for hipOccupancyAvailableDynamicSMemPerBlock API
* SWDEV-533237 : Added test cases for hipOccupancyAvailableDynamicSMemPerBlock
* SWDEV-533237 : Addressed review comments for hipOccupancyAvailableDynamicSMemPerBlock aip test cases
---------
Co-authored-by: jainprad <92369414+jainprad@users.noreply.github.com>
Handled numa data - including cpu and socket list, bitmask,
and affinity for csv format.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Handled numa data - including cpu and socket list, bitmask,
and affinity for csv format.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/amdsmi commit: 1b027d15bd]
* Added Python & C API's for new node devices. Currently these are functional for node 0 only.
- amdsmi_get_node_handle
- amdsmi_get_npm_info
* Added `amd-smi node` CLI for Node Power Management
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
* Added Python & C API's for new node devices. Currently these are functional for node 0 only.
- amdsmi_get_node_handle
- amdsmi_get_npm_info
* Added `amd-smi node` CLI for Node Power Management
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: f8e4771363]
* Forward ctest labels from the execution test to the validation test.
* Adjust test validation parameters for amid_smi samples
The actual number of samples will vary depending on the GPU. This test
is just to validate the presence of the samples
Changes:
- Simplified reset calls
- Updated static limit N/A values to all possible data
(helps csv format be consistent)
- Unit format was broken on static
- get_power_cap() had min/max values swapped, and the return
was missing two fields
- Updated changelog to reflect all changes
Change-Id: I23713471b984f52085372486c6e6ff852e2f42f8
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Changes:
- Simplified reset calls
- Updated static limit N/A values to all possible data
(helps csv format be consistent)
- Unit format was broken on static
- get_power_cap() had min/max values swapped, and the return
was missing two fields
- Updated changelog to reflect all changes
Change-Id: I23713471b984f52085372486c6e6ff852e2f42f8
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: 00a893d299]
- Updated python integration test to account for PPT1 support changes
- Updated set/reset power-cap input format
- Adjusted python API and updated C++ API test
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Change-Id: Ia9d02868b6e91c88c10a9772d9e2d9f37c3c352f
- Updated python integration test to account for PPT1 support changes
- Updated set/reset power-cap input format
- Adjusted python API and updated C++ API test
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Change-Id: Ia9d02868b6e91c88c10a9772d9e2d9f37c3c352f
[ROCm/amdsmi commit: 18faddf6f3]