* Add MI 350 hardware information
* Refactor MI GPU YAML file and corresponding interface
* Add SoC file for gfx950 architecture
* Add analysis report configs for MI 350 containing existing metrics
* Add placeholder None valued metrics for previous architectures to make
baseline comparison work
* Enable testing on MI 350
* Analysis config metric changes
- SPI changes
- Update metric formula for default SPI pipe counter
- Use efficiently collected pipe wise SPI counters
- Add SPI Wave Occupancy
- Add Scheduler-Pipe Wave Utilization
- Update formula for VGPR Writes
- Add Scheduler-Pipe FIFO Full Rate
- CPC changes
- Add CPC SYNC FIFO Full Rate
- Add CPC CANE Stall Rate
- Add CPC ADC Utilization
- SQ changes
- Add VALU co-issue efficiency
- Add F6F4 datatype metrics
- Update formula for total FLOPs by adding F6F4 counters
- Add LDS STORE / LOAD / ATOMIC metrics
- Add LDS STORE / LOAD / ATOMIC bandwidth
- Add LDS FIFO and TA ADDR / CMD / DATA FIFO full rates
* Collect TCP_TCP_LATENCY_sum only for gfx950 (MI 350)
* Do not inject SQ_ACCUM_PREV_HIRES unnecesarily
* Do not hardcode memory and shader clock speeds
* Write num_hbm_channels to sysinfo.csv instead of hbm_bw while profiling
* Move generate sysinfo.csv to pre processing step of profiling
* Add warnings to use --specs-correction for missing sysinfo.csv values during analysis phase
* Update CHANGELOG
* Analysis phase warning to use --specs-correction when needed
[ROCm/rocprofiler-compute commit: f9aa7be97c]
In wheel environment, rocprof-compute in bin folder is not a soft link. For executing rocprof-compute from bin folder, the system path should also have the dependency script paths. Added the same
[ROCm/rocprofiler-compute commit: df2296529b]
Rebuild of rocm-amdgpu-bench roofline binaries for MI200/MI300 systems with rocm6.
Added datatype options to roofline feature.
---------
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
[ROCm/rocprofiler-compute commit: 6cb5bcdbe9]
* Move console logging to logger function to avoid circular dependency in utils module
Signed-off-by: coleramos425 <colramos@amd.com>
* Apply python formatting
Signed-off-by: coleramos425 <colramos@amd.com>
* Remove the default StreamHandler before adding the custom
If you are not explicitly removing this default handler, it could be causing duplicate outputs.
Signed-off-by: coleramos425 <colramos@amd.com>
* Fix lingering bugs from merge conflict resolution
Signed-off-by: coleramos425 <colramos@amd.com>
* Comply to python formatting and update pre-commit hook helper
Signed-off-by: coleramos425 <colramos@amd.com>
* Removing redundant console_log call as the get_mi300_num_xcds() call, otherwise ALL Mi200 profiling runs will print this message
Signed-off-by: coleramos425 <colramos@amd.com>
---------
Signed-off-by: coleramos425 <colramos@amd.com>
[ROCm/rocprofiler-compute commit: 04f92b72a9]
Added command line option to specify which datatype(s) to capture into the roofline PDF(s).
All datatypes are still collected by roofline call if applicable, but only specific datatypes are plotted into PDF outputs. Will dump out all datatypes into one graph, but separate FP from Int into two graphs if needed. Will skip datatype and give error message if the datatype is not valid on a particular gpu arch.
Default is FP32
Reworked roofline calls and plotting to be general enough such that any new datatypes added into rocm-amdgpu-bench can easily be reflected in rocprof-compute with simple modifications in roofline_calc.py.
Adjusted ctest to reflect expected default pdf outputs from roofline.
---------
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
[ROCm/rocprofiler-compute commit: a492e92034]
* Delete static pmc files
* Counter parsing changes
- Move counter parsing logic to another function
- Fix counter parsing regex
- Log list of counters being collected
* Sanity check counters supported by rocprof
- Emit warning instead of error since rocprof support counters
list might be inaccurate
* Do not collect these counters
- TCP_TCP_LATENCY_sum (except for gfx908 and gfx90a)
- SQC_DCACHE_INFLIGHT_LEVEL
* Update logic of writing TCC channel counter definition yaml file
* Fix bug in capture_subprocess_output() utility function
- Make logging optional in capture_subprocess_output()
* Fix formatting and tests
* Update changelong
[ROCm/rocprofiler-compute commit: 58cf702d40]
* Clean up unused functions.
* Fix number of XCDs for MI300X CPX (core partition).
* Add support for memory partition mode.
* Modify total_xcd to adapt to all gpu models.
* Run black and isort.
* Make gpu_arch regex more generic.
* Add error checking for compute partition mode num xcds.
* Set gpu_chip_id as optional.
* Fix get_gpu_model.
---------
Signed-off-by: xuchen-amd <xuchen@amd.com>
[ROCm/rocprofiler-compute commit: 2e7f82aa13]
* solve the error that makes name passed by -n not used in multi-node applications
* isort and black formatted
[ROCm/rocprofiler-compute commit: 23b42e90c9]
When using rocprof v3:
* Use --kernel-include-regex for kernel name filtering
* Use --kernel-iteration-range for kernel dispatch filtering
Update changelog
[ROCm/rocprofiler-compute commit: 45b8937d5d]
* rocprofv3 might not collect any counters for MI 100, handle this case gracefully to prevent test failures
[ROCm/rocprofiler-compute commit: 64ccd588de]
Set locale to C.utf8 instead of en_US.UTF-8
Avoid forcing the user to use en_US.UTF-8. Most Linux systems have C.utf8.
[ROCm/rocprofiler-compute commit: 96a25e8cbc]
Added debug log for when no flops are recorded (total_flops is 0), so AI points will not be plotted.
Removed commented out print statement that is not functional- contains nonexistent method call.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
[ROCm/rocprofiler-compute commit: 1c237c1382]
Higher versions (eg. 0.4.1) have external dependencies that are causing errors and forcing early exits without creating roof plots
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
[ROCm/rocprofiler-compute commit: fd288e6d13]
* Analysis report block based filtering for profiling
* Profiling mode changes
- `-b` option now additionally accepts metric id(s), similar to `-b` option in analyze mode (e.g. 6, 6.2, 6.23)
- Only counters mentioned in the selected analysis report blocks will be collected
- Add parsing logic to identify hardware counters from analysis report blocks
- Add filtering logic to only write filtered counters in perfmon files
- Log not collected counters in one line
- `--list-metrics` option added in profile mode to list possible metric id(s) similar to analyze mode
- Write arguments provided during profiling in profiling_configuration.yaml file
* Analysis mode changes
- During analysis mode, only show report blocks selected during profiling
- If `-b` option is provided in analysis mode, then follow provided filters
- Do not show empty tables in analysis report
* Miscellaneous changes
- Update CHANGELOG
- Add test cases
- Instruction mix report block filter
- Instruction mix and Memory chart report block filter
- Instruction mix report block filter and CPC hardware block filter
- TA hardware block filter
- --list-metrics in profile mode should work
- Move binary handler fixtures to conftest.py to avoid importing
fixtures
- cmake file in tests directory has been updated to compile sample/vmem.hip for testing
* Public documentation changes
- Use the term "Hardware report block" instead of "Hardware block"
- Add documentation for "--list-metrics" option in profile mode
- Add example of filtering by hardware report block such as instruction
mix and wavefront launch statistics
- Add deprecation warning for hardware component (sq, tcc) based filtering
[ROCm/rocprofiler-compute commit: 55cf0e237e]
* Fix post analysis gui in standalone binary (#591)
* Fix post analysis gui in standalone binary
* Add post analysis gui assets and required server libraries for GUI
server and web page
* Add port forwarding to docker test compose
* Update README me to use `docker compose up` instead of `docker compose run`
to run containers with port forwarding and to leverage other
functionalities of docker compose
* Fix rocprofv1 output processing. (#588)
* fix rocprof-compute binary name in package manager install docs
---------
Co-authored-by: vedithal-amd <Vignesh.Edithal@amd.com>
Co-authored-by: xuchen-amd <xuchen@amd.com>
[ROCm/rocprofiler-compute commit: 0aefd15b7b]
Adding FP8 datatype to roofline feature in rocprof-compute on MI300-based systems.
FP8 now shows in terminal output and roofline csv, and outputs a standalone PDF.
---------
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
[ROCm/rocprofiler-compute commit: 848fa1dc18]
* Fix post analysis gui in standalone binary
* Add post analysis gui assets and required server libraries for GUI
server and web page
* Add port forwarding to docker test compose
* Update README me to use `docker compose up` instead of `docker compose run`
to run containers with port forwarding and to leverage other
functionalities of docker compose
[ROCm/rocprofiler-compute commit: 0b3114fa88]