ROCProfiler-Register/Systems/Compute: The license file name in the CMake install module and other locations was originally LICENSE, but it was recently changed to LICENSE.md, requiring an update to the CMake install module and all other relevant locations.
- Clean up and standardization of MIT licenses after discussion with legal team.
- Update README.md with blurb for top-level files.
- MIT License explicitly mentioned for relevant projects.
- Removal of years.
- Copyright attribution should be to `Advanced Micro Devices, Inc.` and not `AMD ROCm(TM) Software`
- Removal of `All rights reserved.`
- Reduce line width of the text for readability.
- Add clear visual separators for additional licenses.
- Convert text files to markdown format for aforementioned separators.
- Update build scripts to point to renamed files.
- Fixed SMI doc references
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
- The graph nodes have been updated to capture the device ID from the capture stream or the current device when explicitly added.
- Update the device ID for the memcpy node, ensuring that the device where the memory is allocated is taken into account for H2D and D2H pinned operations.
Co-authored-by: Anusha GodavarthySurya <Anusha.GodavarthySurya@amd.com>
* > hsa_agent not provided by new api/rocprofiler-sdk and causes every device to have same id,
in cases where gfxip is same and config is different pm4factory doesn't know the difference. This fix uses gfxip and CU count as a key for cache.
* Change comparison from gfxip to name in instances_fncomp_t
Updated comparison in instances_fncomp_t to use 'name' for backward compatibility with rocprofv2.
---------
Co-authored-by: Venkateshwar Reddy Kandula <venkateshwar.kandula1306@gmail.com>
* Make --roof-only, --block and --set mutually exclusive from each other
* Update help output and documentation
* Add sanitize function for checking profiler options
* Update filter blocks arguments when --set or --roof-only is provided
* Update filter_blocks in profiling_config.yaml based on --set option
* Log Filtered Sections instead of Report Sections and Set Selection
* Move soc class function calls from rocprof compute base class to profiler base class
* Fix bug in panel level filtering using --filter-block option
* Remove roofline specific pmc files
* Move microbenchmark entry point from gfx specific soc class to base soc class
* Run microbenchmarks only if block 4 is selected or roof only is selected; skip for mi100
InterceptQueue::Submit had an "all-or-nothing" packet submission policy that
could cause infinite retry loops when the number of packets to submit exceeded
the available queue slots. When 504+ packets needed submission to a ~500-slot
queue, the system would:
1. Set submitted_count=0 (submit nothing)
2. Add retry barrier packet
3. Trigger async handler via StoreRelaxed
4. Attempt to submit overflow packets
5. Fail again due to same space constraints
6. Repeat
Solution:
Added partial packet submission capability during overflow processing while
preserving the original "all-or-nothing" behavior for normal operations.
When processing overflow packets and insufficient space exists for all packets,
the system now submits as many packets as possible rather than none.
The fix:
- Detects overflow processing via !overflow_.empty()
- Allows partial submission: submitted_count = free_slots - barrier_reservation
- Maintains atomicity guarantees for normal packet rewrites
- Prevents infinite retry loops by ensuring forward progress
This resolves deadlocks in high-throughput scenarios while maintaining
backward compatibility and the original design intent for packet rewrite
atomicity.
* SWDEV-551080
* Fix condition for taking shader path, the size check was moved
incorrectly
* Also account for a bitmask returned for preferred engines
New test that does a memory_copy, and right after has the shader access
the data. This verifies that the memory is coherent and that all the
probes and flushes were done correctly by the memory_copy.
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
- Refactor deviceLocalAlloc arguments
- Refactor hostAlloc code, have cleaner interface
- Kern args buffer need to have execute flag set as CP enforces this on
certain newer HW.
* Remove L2 channels from --list-metrics
--list-metrics moved to general options
List metrics for the current architecture
Filter blocks for metrics
Removed test for --list-metrics in profile mode
Test the options don't throw error
Fixed --config-dir error
Test stdout for command line options
Provide path list for loading panel configs
Show L2 Cache (per) channel metrics
Changed command line option names
Can show two levels only
Removed filtering blocks
Moved blocks to original position
Removed filter block tests
Removed filtering
Formaating fix
Readability enhancement
Test formatting
Filter L2 channels without sysinfo
Show avilable metrics for current arch
Intermediate commit
Fixed tests
Added argument sanitization
Added list_metrics to ctest
merge iconflict resolution
Updated test marker
Updated changelog
Fixed formatting
* Updated docs
* libhsakmt: Update ioctl version to 1.18
Sync with kernel ioctl version.
Also explicitly set the ioctl flag to KFD_PROC_FLAG_MFMA_HIGH_PRECISION
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
* libhsakmt: Sync ioctl header by adding kfd_ioctl_profiler
Sync with kernel ioctl version. Add kfd_ioctl_profiler.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
---------
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Move check for kernel filtering to add to roofline pdfs- was originally only labelling pdfs with the filtered kernel names from --kernel when --kernel-names was called, we want it at all times when kernels are filtered.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>