vedithal-amd
|
aa5dfb98f9
|
MI350 Fix L2 cache to HBM read counters/metrics (#2501)
* Fix rocprofiler-sdk metrics definition
* Use TCC_EA0_RDREQ_128B instead of TCC_BUBBLE counter for L2 cache to
HBM counters and metrics
* Update MI350 counter definitions
* FETCH_SIZE
* BANDWIDTH_EA
* Update MI350 metrics definitions
* System Speed of Light, L2-Fabric Read BW
* Roofline Plot Points, AI (Arithmetic Intensity) HBM
* Roofline Performance Rates, HBM Bandwidth
* Remove redundant definition for gfx950 and fix BANDWIDTH_EA definition
Test HBM bandwidth metric for memcopy workload
* Add memcopy.cpp workload
* Add metric validation test suite to validate HBM Bandwidth metric for
memcopy workload
* Move gpu_soc() to test_utils.py for better re-usability
* Update TUI analysis config
* Fix hbm bandwidth formula for mi350 in calc_ai_profile
Co-authored-by: Alysa Liu <Alysa.Liu@amd.com>
|
2026-01-23 15:56:24 -05:00 |
|