2 Commits

Author SHA1 Message Date
vedithal-amd 27585a8a2b Support MI 350 profiling (#632)
* Add MI 350 hardware information

* Refactor MI GPU YAML file and corresponding interface

* Add SoC file for gfx950 architecture

* Add analysis report configs for MI 350 containing existing metrics

* Add placeholder None valued metrics for previous architectures to make
  baseline comparison work

* Enable testing on MI 350

* Analysis config metric changes
    - SPI changes
        - Update metric formula for default SPI pipe counter
             - Use efficiently collected pipe wise SPI counters
        - Add SPI Wave Occupancy
        - Add Scheduler-Pipe Wave Utilization
        - Update formula for VGPR Writes
        - Add Scheduler-Pipe FIFO Full Rate
   - CPC changes
	- Add CPC SYNC FIFO Full Rate
	- Add CPC CANE Stall Rate
        - Add CPC ADC Utilization
   - SQ changes
        - Add VALU co-issue efficiency
        - Add F6F4 datatype metrics
        - Update formula for total FLOPs by adding F6F4 counters
        - Add LDS STORE / LOAD / ATOMIC metrics
        - Add LDS STORE / LOAD / ATOMIC bandwidth
        - Add LDS FIFO and TA ADDR / CMD / DATA FIFO full rates

* Collect TCP_TCP_LATENCY_sum only for gfx950 (MI 350)

* Do not inject SQ_ACCUM_PREV_HIRES unnecesarily

* Do not hardcode memory and shader clock speeds

* Write num_hbm_channels to sysinfo.csv instead of hbm_bw while profiling

* Move generate sysinfo.csv to pre processing step of profiling

* Add warnings to use --specs-correction for missing sysinfo.csv values during analysis phase

* Update CHANGELOG

* Analysis phase warning to use --specs-correction when needed

[ROCm/rocprofiler-compute commit: f9aa7be97c]
2025-04-03 02:21:18 -04:00
JoseSantosAMD e664f7abf4 Pytest add mi200 to analyze workloads (#334)
* Updated links in documentation. (#328)

Updated to reflect new GitHub organization.
Fixed broken links to GitHub pages.

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* update branch for 2.x documentation builds

Signed-off-by: Karl W. Schulz <karl.schulz@amd.com>

* update checkout action and use concurrency instead of cancel-workflow-action

Signed-off-by: Karl W. Schulz <karl.schulz@amd.com>

* test addition of user option for container launch

Signed-off-by: Karl W. Schulz <karl.schulz@amd.com>

* remove --user option for container, try chown instead

Signed-off-by: Karl W. Schulz <karl.schulz@amd.com>

* fixing yaml syntax

Signed-off-by: Karl W. Schulz <karl.schulz@amd.com>

* reorder job step - start with checkout

Signed-off-by: Karl W. Schulz <karl.schulz@amd.com>

* restore missing run directive

Signed-off-by: Karl W. Schulz <karl.schulz@amd.com>

* Update workloads to include log.txt
Add missing MI200 workloads

Signed-off-by: Jose Santos <josantos@amd.com>

* Signed-off-by: Jose Santos <josantos@amd.com>
Add vcopy workload for tests

* Change exit codes for caught failures

Signed-off-by: Jose Santos <josantos@amd.com>

* reformat

Signed-off-by: Jose Santos <josantos@amd.com>

* Add pytest-xdist for pytest -n

Signed-off-by: Jose Santos <josantos@amd.com>

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Signed-off-by: Karl W. Schulz <karl.schulz@amd.com>
Signed-off-by: Jose Santos <josantos@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: Karl W. Schulz <karl.schulz@amd.com>

[ROCm/rocprofiler-compute commit: da506ad9b5]
2024-04-01 14:30:21 -05:00