* Initial refactoring work, including using build targets, and settable MSCCLPP_ROOT, MSCCLPP_SOURCE, MSCCLPP_APPLY_PATCHES.
* Another large refactor of MSCCLPP cmake to make all portions targets with appropriate dependencies. This should include all paths to the final target: starting with a full mscclpp install, starting with custom mscclpp and/or json source code, or from submodules + optional patches.
* Update whitespace Findmscclpp_nccl_static.cmake
---------
Co-authored-by: Corey Derochie <corey.derochie@amd.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
Convert a subset of the ctest to pytest to be used in TheRock CI.
Create a new cmake flag `ROCPROFSYS_INSTALL_TESTING` to control test suite installation.
- pytest package will be installed to share/rocprofiler-systems/tests
- all compiled examples are put in share/rocprofiler-systems/examples
- all test relevant scripts are put in share/rocprofiler-systems/tests
- see README.md in share/rocprofiler-systems/tests
* Pin versions in requirements-test.txt
- Validated compatibility to version pins in requirements.txt
- Validated compatibility with pytest, ctest, automatic test suite
- Validated compatibility with Python 3.9, 3.10, 3.11, and 3.12.
* Remove unused mock dependency
* Initial cleanup of compute workflows and skeleton of ghcr workflow
* Add containers-ci.yml, update opensuse and rhel dockerfiles
* rename id in rocprofiler-compute-ghcr.yml
* Add new line to end of containers-ci.yml
* Update action versions for rocprofiler-compute-ghcr.yml
* Switch back to SHA for action versions
* Add conda set solver classic fix to compute CI dockerfiles
* Update conda install for compute Dockerfiles
* Change opensuse version to 15.6 in containers-ci.yml
* Add fix for ubuntu noble to compute Dockerfile.ubuntu.ci
* Add default distro and version to Dockerfile.ubuntu.ci
* Updated regex for tarball version
* Remove Python3.8 from compute CI Dockerfiles
* Change RHEL 9.4 to 9, add retry for compute workflow
* Revert name change for compute rhel workflow
* update path naming
* Remove binutils-gold from Dockerfile.opensuse.ci
* Remove conda python installs from Dockerfile.ci files in compute
* Change CMake version to 3.21 in compute Dockerfile.ci files
* Update checkout actions from v4 to v5
Integrates rocm-kpack runtime library for loading device code from
external kpack archives at HIP initialization time.
Changes:
- Add kpack_params_ optional to FatBinaryInfo for HIPK metadata
- Parse HIPK magic (0x4B504948) in digestFatBinary to detect kpack'd binaries
- Add ExtractKpackBinary() to load code objects via kpack_load_code_object()
- Wire up kpack cache lifecycle in hip_global.cpp
- Track kpack allocations for proper cleanup
- Support multi-TU binaries via bundle_index (co_index parameter)
The ROCM_KPACK_ENABLED cmake flag controls whether kpack support is compiled
in. When disabled, HIPK binaries return hipErrorNotSupported.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude <noreply@anthropic.com>
This test is launching threads that call HIP APIs and then
immediately detaching those threads and returning from main().
This causes multiple issues because when returning from main(),
global variables are destroyed while threads might still be running which
leads to random segfaults.
* add additional runtime checks and gfx1201 fix
This commit contains three fixes:
- increase the max. number of files at the beginning of the run to the
max. allowed by the system
- check for large BAR support. WE don not abort if its not available,
but print a warning.
- for gfx1201, do not use uncached memory at the moment.
* Change get_arch_name to return const char*
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Fix C++ new syntax
not sure how it compiled before
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* use snprintf instead of strncpy
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* destructor cleanip
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* add const keyword
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Pin dependencies and fix test paths for package layout
- Pin all dependencies in requirements.txt to specific versions to ensure stability and reproducibility.
- Update test_autogen_config.py to correctly resolve source paths for both development and installed package layouts.
- Validated compatibility with Python 3.9, 3.10, 3.11, and 3.12.
* Remove setuptools dependency since we dont support pip install and instead use cmake
* clr: Implement dynamic stream to HW queue assignment
This change implements dynamic stream to hardware queue (HWq) mapping
with the following features:
* Queue depth heuristics with weights for optimal HWq assignment
* Make last used queue sticky for better locality
* Use pipe HWq to pipe mapping - gfx9 follows a round-robin queue to
pipe mapping based on creation order (single process per device only,
as pipe ID is statically assigned by runtime)
* More aggressive heuristic usage for better queue distribution
* Extend dynamic queues support for all stream priorities
Environment variables:
* DEBUG_HIP_DYNAMIC_QUEUE: 0 - disabled, 1 - Depth heuristics 2 -
Depth+Pipe heuristics
* DEBUG_HIP_IGNORE_STREAM_PRIORITY=1: ignore priority stream creation
* clr: Clean up last_used_queue_
*Added iteration_multiplex_impute_counters on pmc data- GUI dataframe did not implement this in the build_layout method previously
*Created a Workload() in profile mode post-processing for roofline html standalone plot to be generated- this will be removed once roofline plot is moved to analyze phase in future release
*Added iteration_multiplexing run parameter to roofline object init so that we can accurately parse dataframe if the option was used during profiling- this helps us to avoid reading nan values in certain dispatches that did not get imputed in calc_ai_profile
*Cleanup for unused legacy code, adjusted method parameters to assist in moving roofline plotting to analyze mode in future release
*Update iteration multiplexing data imputation algorithm to impute counters for ungrouped dispatches at the end based on the previous group. This however won't work if there are no dispatches that can be grouped (i.e. number of dispatches < number of counter buckets)
---------
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
## Motivation
In order for Optiq to be able to detect that counter tracks are of the same type, we aligned `info_pmc` symbol naming across the tracks of the same type. Being able to know this will be useful for grouping and categorizing similar types of counter tracks and for setting up a consistent y-axis scale when plotting the values on charts.
## Technical Details
Replace unique and/or ordered symbol names with counter-common symbol name which will be the same for the counters of the same type, with counter track name remaining the unique identifier for that counter track. For example, the "symbol" field was "JpegAct_0" but is now "JpegAct".
## Motivation
With the introduction of the new logging system base on `spdlog` library, opportunity shows to replace `timemory` dependent JOIN implementation with `fmt` library `format` and `join` APIs, which are shipped as a part of `spdlog` lib
## Technical Details
Use `fmt` provided APIs to properly format and package strings.
## Motivation
Fix roctx range markers (Push/Pop, Start/Stop) not being displayed correctly in rocpd output. The Visualizer was showing only Stop/Pop events as instant markers instead of proper duration ranges with labels, while Perfetto output displayed them correctly.
## Technical Details
In `tool_tracing_callback_stop()`, the rocpd/database output was using `user_data->value` (timestamp of the Pop/Stop event) instead of `begin_ts` (corrected timestamp from the corresponding Push/Start event) when calling `cache_region()`.
The Perfetto output already used `begin_ts` correctly (line 818). This change aligns the rocpd output with the Perfetto behavior by using `begin_ts` instead of `user_data->value` (line 887).
Updated rocpd validation rules
## Motivation
<!-- Explain the purpose of this PR and the goals it aims to achieve. -->
The validate-rccl-* tests were failing because "RCCL Comm" counters were not being written to perfetto traces when using the new cached-perfetto approach.
## Technical Details
<!-- Explain the changes along with any relevant GitHub links. -->
Root Cause: The write_perfetto_counter_track() in rccl.cpp was only called when config::get_use_perfetto() returned true, which requires ROCPROFSYS_TRACE_LEGACY=ON. This meant RCCL counters weren't captured with the new trace cache approach.
Solution: Integrated RCCL with the trace cache system:
Changes to source/lib/rocprof-sys/library/rocprofiler-sdk/rccl.cpp:
- Added cache_rccl_comm_data_events<Track>() function to store RCCL comm data via pmc_event_with_sample with category::comm_data
- Modified tool_tracing_callback_rccl() to always cache events for new perfetto approach, while preserving legacy write_perfetto_counter_track() calls for backward compatibility
Changes to tests/rocprof-sys-testing.cmake:
- Added rccl_api to ROCPROFSYS_ROCM_DOMAINS to enable RCCL API callback tracing
Handler verification: The perfetto_processor_t already has a handler for ROCPROFSYS_CATEGORY_COMM_DATA in m_pmc_track_map that processes the cached events.
* SWDEV-540597 - Reset last error to avoid its impact in next iteration.
* SWDEV-540597 - Bypass compiler error as we need to call hipGetLastError without checking error to reset last error.
---------
Co-authored-by: Jaydeep Patel <jaydeepkumar.patel@amd.com>
## Motivation
ROCR on Windows uses WSL implementation as the codebase. We want to make
sure Windows changes can continue to work with WSL and share the same
core implementation. Hence, it's easier to maintain the code under the
same rocm-system infrastructure and automate all builds/tests in the
future.
## Technical Details
The new files is the copy of https://github.com/ROCm/librocdxg/ with
preserved history. Native windows support and clean-ups will be added in
the following check-ins.
The same command lines can be used to build WSL under libhsakmt folder
for now.
```
# Set the Windows SDK path (adjust version number if different)
export win_sdk='/mnt/c/Program Files (x86)/Windows Kits/10/Include/10.0.26100.0/'
# Build the library
mkdir -p build
cd build
cmake .. -DWIN_SDK="${win_sdk}/shared"
make
sudo make install
```
## JIRA ID
SWDEV-558849
## Test Plan
N/A
## Test Result
N/A
## Submission Checklist
In order for hipMemPrefetchAysnc_v2() api to work, we need rocr to
migrates the ranges of pages requested to the particular NUMA node in
question, via move_pages().
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>