## Motivation
In order for Optiq to be able to detect that counter tracks are of the same type, we aligned `info_pmc` symbol naming across the tracks of the same type. Being able to know this will be useful for grouping and categorizing similar types of counter tracks and for setting up a consistent y-axis scale when plotting the values on charts.
## Technical Details
Replace unique and/or ordered symbol names with counter-common symbol name which will be the same for the counters of the same type, with counter track name remaining the unique identifier for that counter track. For example, the "symbol" field was "JpegAct_0" but is now "JpegAct".
## Motivation
With the introduction of the new logging system base on `spdlog` library, opportunity shows to replace `timemory` dependent JOIN implementation with `fmt` library `format` and `join` APIs, which are shipped as a part of `spdlog` lib
## Technical Details
Use `fmt` provided APIs to properly format and package strings.
## Motivation
Fix roctx range markers (Push/Pop, Start/Stop) not being displayed correctly in rocpd output. The Visualizer was showing only Stop/Pop events as instant markers instead of proper duration ranges with labels, while Perfetto output displayed them correctly.
## Technical Details
In `tool_tracing_callback_stop()`, the rocpd/database output was using `user_data->value` (timestamp of the Pop/Stop event) instead of `begin_ts` (corrected timestamp from the corresponding Push/Start event) when calling `cache_region()`.
The Perfetto output already used `begin_ts` correctly (line 818). This change aligns the rocpd output with the Perfetto behavior by using `begin_ts` instead of `user_data->value` (line 887).
Updated rocpd validation rules
## Motivation
<!-- Explain the purpose of this PR and the goals it aims to achieve. -->
The validate-rccl-* tests were failing because "RCCL Comm" counters were not being written to perfetto traces when using the new cached-perfetto approach.
## Technical Details
<!-- Explain the changes along with any relevant GitHub links. -->
Root Cause: The write_perfetto_counter_track() in rccl.cpp was only called when config::get_use_perfetto() returned true, which requires ROCPROFSYS_TRACE_LEGACY=ON. This meant RCCL counters weren't captured with the new trace cache approach.
Solution: Integrated RCCL with the trace cache system:
Changes to source/lib/rocprof-sys/library/rocprofiler-sdk/rccl.cpp:
- Added cache_rccl_comm_data_events<Track>() function to store RCCL comm data via pmc_event_with_sample with category::comm_data
- Modified tool_tracing_callback_rccl() to always cache events for new perfetto approach, while preserving legacy write_perfetto_counter_track() calls for backward compatibility
Changes to tests/rocprof-sys-testing.cmake:
- Added rccl_api to ROCPROFSYS_ROCM_DOMAINS to enable RCCL API callback tracing
Handler verification: The perfetto_processor_t already has a handler for ROCPROFSYS_CATEGORY_COMM_DATA in m_pmc_track_map that processes the cached events.
* SWDEV-540597 - Reset last error to avoid its impact in next iteration.
* SWDEV-540597 - Bypass compiler error as we need to call hipGetLastError without checking error to reset last error.
---------
Co-authored-by: Jaydeep Patel <jaydeepkumar.patel@amd.com>
## Motivation
ROCR on Windows uses WSL implementation as the codebase. We want to make
sure Windows changes can continue to work with WSL and share the same
core implementation. Hence, it's easier to maintain the code under the
same rocm-system infrastructure and automate all builds/tests in the
future.
## Technical Details
The new files is the copy of https://github.com/ROCm/librocdxg/ with
preserved history. Native windows support and clean-ups will be added in
the following check-ins.
The same command lines can be used to build WSL under libhsakmt folder
for now.
```
# Set the Windows SDK path (adjust version number if different)
export win_sdk='/mnt/c/Program Files (x86)/Windows Kits/10/Include/10.0.26100.0/'
# Build the library
mkdir -p build
cd build
cmake .. -DWIN_SDK="${win_sdk}/shared"
make
sudo make install
```
## JIRA ID
SWDEV-558849
## Test Plan
N/A
## Test Result
N/A
## Submission Checklist
In order for hipMemPrefetchAysnc_v2() api to work, we need rocr to
migrates the ranges of pages requested to the particular NUMA node in
question, via move_pages().
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
* SWDEV-561708 Initial shared queue pool apis
* Validate params; some fixes in callback function (but still needs to be checked)
* Dtor cleanup
* minor
* Enable profiling; remove callback since aql_queue takes care of it
* setPriority and setCuMask APIs updated for counted queues
* Increasing step and minor version for rocprofiler
* Tests for CountedQueueManager
* tests
* Code refactored to make pool manager part of GpuAgent only (incomplete); unique handles issue pending
* Refactored code to support CQM inside GpuAgent and unique handles; multithreaded test added
* Changed to ASSERT_SUCCESS macros for all tests
* RIng buffer overflow test added
* tests fixed; cleanup added at hsa_shutdown
* priority conversion table changes
* Compiler warnings fixed
* Rewrite 1 test; add desc and improve SetUp() code
* Improvement
* Unififed getinfo for both counted and non-counted queues
* Address PR feedback
* Addressing feedback: memleak, data type mismatch, documentation
* improve comment
* format
* Missing HSA_API macros for roctracer
* Revert "Addressing feedback: memleak, data type mismatch, documentation"
This reverts commit 5e498a55fb3640e00d06cec63dcec79293fb23de.
* Improving acquire api doc
* release api doc improved
* error codes for release api doc
* SWDEV-555889 - Support mipmap on rocr
Support mipmap in hip-rt on rocr backend.
Enable all mipmap tests in Windows.
Some other minor improvement.
Add some SRD logs that will be removed finally.
* Add sampler.mipFilter to fix sampler issues on mipmap in rocr.
Fix format issues of view of leveled image and mipmap image in blit kernel in rocr.
Enabled disabled mipmap tests.
* Rewrite view logic
* Set word4.f.PITCH = 0 for mipmap SRD on navi31 to fix unstable test issues.
Reset last error in nagative tests.
* Remove SRD dump log from hip-rt
Let Rocr mipmap log be in condition.
* minor format chang
* Exclude mipmap tests for mi200+ which don't support mipmap.
Updated to convert flags correctly
Added ObjectRegistry to track registered and mapped resources and incorporated it into hip_gl.
Added mip level check
Made functions static in-line
Reworked validation to be more clear.
Co-authored-by: Prasannakumar Murugesan <prmuruge@amd.com>
As part of an earlier commit, bfloat16 handling in reduce kernel for FuncMinMax fell into generic/default template when there is no SPECIALIZE_REDUCE for a particular type, this generic template does a bitwise integer comparison and it broke bfloat16 ops.
change the else-if statement to else statement, that way it covers both ROCm version < 6.0 and >= 6.0 (with ROCm > 6.0, device.h already typedefs __hip_bfloat16 to hip_bfloat16, so no special case is needed here).
[ROCm/rccl commit: fa366ac03f]
Co-authored-by: Prasannakumar Murugesan <prmuruge@amd.com>
As part of an earlier commit, bfloat16 handling in reduce kernel for FuncMinMax fell into generic/default template when there is no SPECIALIZE_REDUCE for a particular type, this generic template does a bitwise integer comparison and it broke bfloat16 ops.
change the else-if statement to else statement, that way it covers both ROCm version < 6.0 and >= 6.0 (with ROCm > 6.0, device.h already typedefs __hip_bfloat16 to hip_bfloat16, so no special case is needed here).