* Removing FP8 from peak VALU datatypes - PEAK_OPS_DATATYPES.
* Similar change for BF16.
* Roofline binaries from rocm-amdgpu-bench generated 10/22.
https://github.com/ROCm/rocm-amdgpu-bench/commit/2113ef1f5eada8a4a6e44e6d07fd6abac9b0a3f8
Bins include change that removes FP8 and BF16 peak VALU benchmarks.
Built and tested on rhel8, azl3, ubuntu22.04, sles15sp6.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Re-committing the bins
accidentally copied over bins from the wrong folder earlier, caught by Gowthami during testing.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Updated changelog
* gersemi fix
* Changelog corrected.
* Changelog fix.
* Adding this to the 7.2.0 section to be picked up in an RC build.
* Moving changelog entry into unreleasesd section - team reconfirmed cutoff date after I requested this change so I am just quickly correcting my mistake in my ask.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
---------
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
* refactor: duplicated path helpers into common/path.hpp
* update rocprof-sys-instrument to use shared path utility
* Add path::realpath(std::string[, std::string*]) helper function in common/path.hpp for binaries
* common: centralize remove_env implementation in environment.hpp
* remove unused includes from rocprof-sys binaries and argparse
* changing set to unordered_set wherever sorting is not required and additional cleanup
* review comment incorporated
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* copilot review for remove_env incorporated
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
ldconfig is run during rocm-opencl package installation.
Installing libamdocl.so in /opt/rocm-xxx/lib exposes all ROCm libraries when /opt/rocm/lib is added to ldconfig.
To prevent this, libamdocl.so is now installed in /opt/rocm-xxx/lib/opencl.
ldconfig will use the updated path, limiting exposure to only libamdocl.so library.
Co-authored-by: raramakr <raramakr@amd.com>
* Split roofline tests
* Use N/A for missing values
* Test eval_expression for no valid data
* Fixed tests
* Updated Changelog for N/A
* Fixed platform specific test failure
* Use native tool for counter collection
* Add native counter collection tool which uses rocprofiler-sdk C++
library public API to get counter collection data
* This is enabled by default, unless --no-native-tool option is
provided or ROCPROF=rocprofv3 env. var. is provided
* This tool is only supported for ROCm version >=7.x.x
* This tool is not supported for attach/detach scenario
* Build native tool shared object during build time
* If using rocprof-compute without building then runtime compilation of
t push native tool shared object is performed
* rocprofiler-sdk tools is still used for services other than counter
collection and data collected by native tool is merged into the
rocpd/csv output of rocprofiler-sdk tool
* Make `rocpd` choice the default choice for `--format-rocprof-output`
option
* If `rocpd` public API from rocprofiler-sdk library is not present,
then fallback to `csv` choice
* In this case only `pmc_perf.csv` is written in workload folder
instead of multiple `csv` files for each profiling run
* Remove `json` choice from `--format-rocprof-output` option since it
functions identical to `csv` option
* Rename option `--rocprofiler-sdk-library-path` to
`--rocprofiler-sdk-tool-path` since we LD_PRELOAD the
rocprofiler-sdk tool shared object and not the rocprofiler-sdk library
shared object
* Fix the meaning of `--dispatch` option in `profile` mode to mention
dispatch iteration filtering instead of dispatch id filtering
* --dispatch option in analyze mode does dispatch id filtering
* Move standalone binary creation logic from cmake file to docker file
* fix native counter collection tool during attach/detach
* improve logging
* fix attach detach with native tool
* fix attach detach with native tool
* do not support attach/detach in native tool
* Update changelog
* add standalone binary creation functionality in cmake
* address review comments
* address review comments
* fix formatting
* address review comments
* Adding paths for cmake to search. Also updated min. cmake requirement to 3.21 as this was when hip was supported.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Update hip compiler ID check, sometimes comes up as Clang, sometimes ROCMClang- depends on setup.
Updated formatting.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* RHEL8.10 unable to compile due to defaulting to old c++ version, need to force c++17
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Updating changelog per docs team recommendations
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Apply suggestions from code review to changelog
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
* Do not required HIP complier to build native counter collection tool
* fix cmake
* gersemi formatting on latest cmake change
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* ex ci updated dependencies to include rocprofiler-sdk, but cmake was still not capturing the path- there was a commit that added to the cmake_prefix_path entry that specified rocprof-sdk's cmake location ut was too specific for the search paths in find_package's config mode.
removing the cmake_prefix_path var and adding hints to find_package call instead, and specifying config mode so it knows how to construct the search paths
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* gersemi run for formatting
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Still need prefix path, should not have been removed in last commit but does need to be shortened to just the rocm path to allow for find_package config mode to do the job
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* include cstdint for uint32_t
* Run formatting on helper.cpp
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Remove rocm 7.2 release stuff from version and changelog and handle it in separate pr
* fix version
* fix changelog
* fix changelog
* run ruff formatter
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* fix rocprofiler-sdk attach so path
---------
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
Force tencentos to use rhel-based bin since tencent is branched off of centos, which is branch of fedora.
Verified rocprof-compute run correctly selects bin to use, and the roofline benchmark values look similar between runs on rhel vs tencentos4 docker images on same system.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Add user start/stop bool
* Update documentation for user-api
* Update projects/rocprofiler-systems/source/lib/rocprof-sys-dl/dl.cpp
Co-authored-by: Aleksandar Djordjevic <aleksandar.djordjevic@amd.com>
* Format fix
---------
Co-authored-by: Aleksandar Djordjevic <aleksandar.djordjevic@amd.com>
* attach: rename librocprofv3-attach
- Renames library to librocprofiler-sdk-rocattach
- ROCAttach library will be formalized and documented in future commit
* Address review comments
- Rename rocprofv3-attach.py to rocprof-attach.py
- Use common filesystem.hpp in rocattach
* Fix component name typo
* Doc fixup
---------
Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
* roll back json file processing logic for pc sampling
* format cmake files
* Revert "format cmake files"
This reverts commit e64df65a8f30abcb6738e3a0d7ffd4270bd1d302.
## Overview and rationale
This reverts https://github.com/ROCm/rocm-systems/pull/1886, which...
* Re-applies https://github.com/ROCm/rocm-systems/pull/1866
* Reverts https://github.com/ROCm/rocm-systems/pull/1728
(So it restores the [`amdgpu-windows-interop/`](https://github.com/ROCm/rocm-systems/tree/develop/shared/amdgpu-windows-interop) folder back to the state from a few weeks ago)
The rationale for this change is at https://github.com/ROCm/rocm-systems/pull/1866:
> Last PAL update broke applications on gfx12 Windows.
## Cross-repository change details
That PR failed to build but was merged with this explanation:
> TheRock CI Windows build fails as expected with this revert.
>
> References to these PAL members need to be stripped out in a patch on TheRock.
>
> ```
> 11.3 C:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\rocclr\device\pal\palubercapturemgr.cpp(152): error C2039: 'RegisterTraceStateChangeCallback': is not a member of 'GpuUtil::TraceSession'
> 11.4 C:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\pal\inc\gpuUtil\palTraceSession.h(372): note: see declaration of 'GpuUtil::TraceSession'
> 11.4 C:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\rocclr\device\pal\palubercapturemgr.cpp(195): error C2039: 'UnregisterTraceStateChangeCallback': is not a member of 'GpuUtil::TraceSession'
> 11.4 C:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\pal\inc\gpuUtil\palTraceSession.h(372): note: see declaration of 'GpuUtil::TraceSession'
> ```
The patch in TheRock was updated in https://github.com/ROCm/TheRock/pull/2154. This rolls forward by updating the ref for TheRock.
That original PR could have been sequenced differently to avoid a build break - perhaps by
* Pointing to a branch in TheRock with the patch rebased
* Deleting the patch in the workflows here but holding a local copy of the path to be applied in workflows
* Landing the patch as a normal commit instead of carrying it at all
## Test plan
1. Watch TheRock CI here (https://github.com/ROCm/rocm-systems/actions/runs/19447202693/job/55644411119?pr=1893)
2. Build locally:
```bash
# In rocm-systems
git am --whitespace=nowarn D:\projects\TheRock\patches\amd-mainline\rocm-systems\0001-Revert-SWDEV-543498-Some-compute-Ubertrace-profiles-.patch
git am --whitespace=nowarn D:\projects\TheRock\patches\amd-mainline\rocm-systems\0003-Use-is_versioned-true-consistently-in-both-Comgr-Loa.patch
git am --whitespace=nowarn D:\projects\TheRock\patches\amd-mainline\rocm-systems\0006-Explicitly-load-libamdhip64.so.7.patch
# Note: the build fails with the observed errors if patch 0001 is not applied!
# In TheRock
cmake -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=cl.exe -DCMAKE_CXX_COMPILER=cl.exe \
-DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
-DPython3_EXECUTABLE=d:/projects/TheRock/.venv/Scripts/python \
-DTHEROCK_ROCM_SYSTEMS_SOURCE_DIR=d:/projects/TheRock/../rocm-systems \ # IMPORTANT
-DTHEROCK_AMDGPU_FAMILIES=gfx110X-all \
-DBUILD_TESTING=ON \
-DTHEROCK_ENABLE_ALL=ON \
-Damd-llvm_BUILD_TYPE=RelWithDebInfo \
-S D:/projects/TheRock \
-B D:/projects/TheRock/build \
-G Ninja
cmake --build D:/projects/TheRock/build --target hip-clr
# [build] Build finished with exit code 0
cmake --build D:/projects/TheRock/build --target ocl-clr+dist
# [build] Build finished with exit code 0
```
Use fork/waitpid to isolate API call and detect SIGKILL from kernel
Signed-off-by: Sumanth Gavini <sumanth.gavini@amd.com>
[ROCm/amdsmi commit: a044536b8d]
* Added Product Serial Number to the raw_bytes cper entries
* Added Product Serial Number to the Python API return
---------
Signed-off-by: Saeed, Oosman <Oosman.Saeed@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
* Added Product Serial Number to the raw_bytes cper entries
* Added Product Serial Number to the Python API return
---------
Signed-off-by: Saeed, Oosman <Oosman.Saeed@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: 05ea00dcc4]
* [SWDEV-563828] Fix incorrect help text for --perf-determinism flag to indicate it expects GFXCLK frequency in MHz
---------
Signed-off-by: Billakanti, Koushik <Koushik.Billakanti@amd.com>
* [SWDEV-563828] Fix incorrect help text for --perf-determinism flag to indicate it expects GFXCLK frequency in MHz
---------
Signed-off-by: Billakanti, Koushik <Koushik.Billakanti@amd.com>
[ROCm/amdsmi commit: 23f68555db]
* Add XGMI and PCIe metrics to the profiling data
Add support for AMD XGMI (GPU-to-GPU interconnect) and PCIe
metrics:
* XGMI link width in bits
* XGMI link speed in GT/s
* Per-link read bandwidth (KB)
* Per-link write bandwidth (KB)
- Add new categories for PCIe metrics:
* PCIe link width
* PCIe link speed in GT/s
* Accumulated bandwidth (MB)
* Instantaneous bandwidth (MB/s)
* Fix VCN/JPEG insert logic
* Modify the gpu_metrics struct to accomodate XCP structure
* Add ctest automation for gpu interconnect metrics
* Refactor to move gpu_metrics struct and serialization to another file
* Possible fix for timeout in CI
Fix redundant skip check in ctest
Add xgmi and pcie option in rocprof-sys-avail.
* Change2: Address review comments
Change ctest sampling to avoid timeout
Change variable name and code structuring
* Add option in ctest to run rocprof-sys-run without rewrite
Run transferbench with rocprof-sys-run without sampling
* Change3: Fix sample insert bug and address review comments
xgmi and pci support check
renaming variables
additional hip_api validation in rocpd
* Reduce the load from the trnasferBench sample
The CI builds were timing out when flushing a big temporary file to the
DB: (2720824.23 KB / 2720.82 MB / 2.72 GB)...