## Motivation
ROCR on Windows uses WSL implementation as the codebase. We want to make
sure Windows changes can continue to work with WSL and share the same
core implementation. Hence, it's easier to maintain the code under the
same rocm-system infrastructure and automate all builds/tests in the
future.
## Technical Details
The new files is the copy of https://github.com/ROCm/librocdxg/ with
preserved history. Native windows support and clean-ups will be added in
the following check-ins.
The same command lines can be used to build WSL under libhsakmt folder
for now.
```
# Set the Windows SDK path (adjust version number if different)
export win_sdk='/mnt/c/Program Files (x86)/Windows Kits/10/Include/10.0.26100.0/'
# Build the library
mkdir -p build
cd build
cmake .. -DWIN_SDK="${win_sdk}/shared"
make
sudo make install
```
## JIRA ID
SWDEV-558849
## Test Plan
N/A
## Test Result
N/A
## Submission Checklist
In order for hipMemPrefetchAysnc_v2() api to work, we need rocr to
migrates the ranges of pages requested to the particular NUMA node in
question, via move_pages().
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
* SWDEV-561708 Initial shared queue pool apis
* Validate params; some fixes in callback function (but still needs to be checked)
* Dtor cleanup
* minor
* Enable profiling; remove callback since aql_queue takes care of it
* setPriority and setCuMask APIs updated for counted queues
* Increasing step and minor version for rocprofiler
* Tests for CountedQueueManager
* tests
* Code refactored to make pool manager part of GpuAgent only (incomplete); unique handles issue pending
* Refactored code to support CQM inside GpuAgent and unique handles; multithreaded test added
* Changed to ASSERT_SUCCESS macros for all tests
* RIng buffer overflow test added
* tests fixed; cleanup added at hsa_shutdown
* priority conversion table changes
* Compiler warnings fixed
* Rewrite 1 test; add desc and improve SetUp() code
* Improvement
* Unififed getinfo for both counted and non-counted queues
* Address PR feedback
* Addressing feedback: memleak, data type mismatch, documentation
* improve comment
* format
* Missing HSA_API macros for roctracer
* Revert "Addressing feedback: memleak, data type mismatch, documentation"
This reverts commit 5e498a55fb3640e00d06cec63dcec79293fb23de.
* Improving acquire api doc
* release api doc improved
* error codes for release api doc
* SWDEV-555889 - Support mipmap on rocr
Support mipmap in hip-rt on rocr backend.
Enable all mipmap tests in Windows.
Some other minor improvement.
Add some SRD logs that will be removed finally.
* Add sampler.mipFilter to fix sampler issues on mipmap in rocr.
Fix format issues of view of leveled image and mipmap image in blit kernel in rocr.
Enabled disabled mipmap tests.
* Rewrite view logic
* Set word4.f.PITCH = 0 for mipmap SRD on navi31 to fix unstable test issues.
Reset last error in nagative tests.
* Remove SRD dump log from hip-rt
Let Rocr mipmap log be in condition.
* minor format chang
* Exclude mipmap tests for mi200+ which don't support mipmap.
Updated to convert flags correctly
Added ObjectRegistry to track registered and mapped resources and incorporated it into hip_gl.
Added mip level check
Made functions static in-line
Reworked validation to be more clear.
Co-authored-by: Prasannakumar Murugesan <prmuruge@amd.com>
As part of an earlier commit, bfloat16 handling in reduce kernel for FuncMinMax fell into generic/default template when there is no SPECIALIZE_REDUCE for a particular type, this generic template does a bitwise integer comparison and it broke bfloat16 ops.
change the else-if statement to else statement, that way it covers both ROCm version < 6.0 and >= 6.0 (with ROCm > 6.0, device.h already typedefs __hip_bfloat16 to hip_bfloat16, so no special case is needed here).
## Motivation
Enable UCX communication tracing and communication metadata
## Technical Details
Implement UCX API wrappers to trace transport-layer communication. This adds communication data tracking and exposes “UCX Comm Send/Recv” timelines, enabling detailed analysis of MPI, OpenSHMEM, and other UCX-based runtime communication patterns.
- Implements function interception for UCX functions across multiple categories using gotcha component.
- Extended comm_data component to track UCX send/recv operations - Added ucx_send and ucx_recv labels for Perfetto counter tracks. Integrated UCX data tracking with existing MPI/RCCL tracking infrastructure.
- Added ROCPROFSYS_USE_UCX configuration option (enabled by default).
- Created FindUCX.cmake module for UCX header detection. Falls back to internal UCX headers if system headers not found.
- Updated all Dockerfiles to include UCX dependencies.
* [SWDEV-559965] Update Changelog for amd-smi set --power-cap
Updated Changelog to mention flexible argument
ordering for power cap type in amdsmi power cap set.
Corrected Changelog documentation on PPT1 reset
power_cap command.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
## Motivation
- Added `check_rocminfo` function that returns true if the provided regex was found, false otherwise. Can also use `GET_OUTPUT` to get the raw output filtered with or without a regex.
- Moved `rocprofiler_systems_get_gfx_archs()` to `MacroUtilities.cmake`
- Added `rocprofiler_systems_lookup_gfx()`, which detects whether a given `gfx` is from the `instinct`, `radeon` or `apu` family.
- Added `ROCPROFSYS_GFX_TARGETS` as a build argument. Used to specify the offloading architectures that GPU examples should compile for. If empty, defaults to whatever your system has.
- GPU examples now check if the given `gfx` targets (from `ROCPROFSYS_GFX_TARGETS`) are supported.
- OMPVV offload tests now only compile if `amdflang` version is `>= 20`
- Improve link time by reducing the number of GFX targets that binaries need to support.
- RCCL is now passed a `GPU_TARGETS` var specifying the architectures to build/link against.