* Use own counter definition
* Do not depend on rocprofiler-sdk counter definition
* Add missing counter definitions for MI100, MI200, MI300, MI350 series
* Counters added based on register specification
* This prevents some missing metrics
* Enable SQC_DCACHE_INFLIGHT_LEVEL counter and associated metrics
* Enable TCP_TCP_LATENCY counter and associated counter for all GPUs
except MI300
* Update TCC_EA_* counters for MI100 to TCC_EA0_*
* Update MI100 metrics which depend on TCC_EA0_* counters
* Enable accumulation counters for MI100
* Improve rocprof list avail usage to get a better idea of supported
counters
* Update CHANGELOG
* Move accumulation counters to counter definition
---------
Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
* Remove .git folder and git command check in cmake
* Update docker container to work in monorepo
* Update docker container to mount the top level folder in monorepo
* Changing CDash Project
* Fixing CI
* Fixing AQLProfile CDash
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
RHEL8 now being supported until EOS, rebuilding rhel8 bins for rocm7 (previously built for rocm7 rhel9).
Remove roofline-rhel9-rocm7, replace with new roofline-rhel8-rocm7.
Update check for roof bin.
Update any doc mentions of rhel min support version back to rhel8.
---------
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Add Utilization to metric name for Bandwidth related metrics whose Unit
is Percent
* Update Unit of Bandwidth metrics to Gbps
* Update metric Formula to use total duration as denominator instead of normalization unit.
* Update metric Description
* Update metric Unit
* Update CHANGELOG
Problem with original test:
- Created circular dependencies between queues:
* Queue1: Kernel A → Barrier(waits for signal_2) → Kernel C
* Queue2: Barrier(waits for signal_1) → Kernel B → sets signal_2
- With strict "one kernel at a time" serialization, this created deadlock:
* Queue1 executed Kernel A, then blocked on barrier waiting for signal_2
* Serializer switched to Queue2, but Queue2 was blocked waiting for signal_1
* Neither queue could proceed: Queue1 needed Queue2's Kernel B to complete,
but Queue2 couldn't start until Queue1 finished completely
- Test would hang indefinitely at hsa_signal_wait_relaxed() for signal_2
Solution implemented:
- Reordered packet submission to eliminate circular dependencies
- Ensured signal producers execute before consumers need them:
* Kernel A produces signal_1 before Queue2's barrier needs it
* Kernel B produces signal_2 before Queue1's continuation needs it
- Dependencies now flow forward without cycles, allowing serializer progress
Refactoring changes:
- Extract common functionality into helper functions:
* create_completion_signal() for signal creation
* create_queue() for queue creation
* submit_kernel_packet() for kernel dispatch packets
* submit_barrier_packet() for barrier packets
- Add comprehensive documentation explaining expected execution pattern
- Simplify main() function making the dependency flow more readable
Co-authored-by: Benjamin Welton <bewelton@amd.com>
[ROCm/rocprofiler-sdk commit: b5e1645a14]