Graf commitů

460 Commity

Autor SHA1 Zpráva Datum
Trowbridge, Ian cc0c401615 Memory Allocation Counter Track Shows Total Allocation (#71)
* Counter track for memory allocation is now a running sum showing total allocation

* Address review comments

* Update source/lib/output/generatePerfetto.cpp

Co-authored-by: Meserve, Mark <Mark.Meserve@amd.com>

* Updated to reflect review comments

* Fix compilation errors on CI

* remove braces on scalar

* Fix struct compilation issues

* Removed name_to_id for sanitizer

---------

Co-authored-by: Meserve, Mark <Mark.Meserve@amd.com>
2025-02-12 12:59:53 -06:00
Nagaraj, Sriraksha 5d0b220c37 target cu to string input (#198)
* target cu to string input

* review comments

* review comments
2025-02-12 12:51:39 -06:00
Kandula, Venkateshwar reddy 6427fbafc2 Accum_vgpr support in Rocprofv3 (#70)
* output accumulate vgpr count

* fix logic for computing accum_vgpr

* add accum_vgpr to csv.

* accumulation vgpr's docs and support for rocprofv3

* CHANGELOG.md

---------

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com>
2025-02-12 10:47:46 -08:00
Bhardwaj, Gopesh 075d36eb82 output format envs doc update (#173) 2025-02-11 21:37:12 -06:00
Madsen, Jonathan 59b41ab5aa rocprofv3: Update rocprofv3 command line for ATT (#201)
* rocprofv3: suppress agent info when no data collected

* Update output config serialization

- full serialization of output configuration

* Update rocprofiler-sdk-att/tests

- add version and soversion
- change output directory
- generate libatt_decoder_summary
- disable tests instead of removing them

* Update rocprofv3 command-line

- make --att-library-path hidden by default
- simplify check_att_capability
- reorder pc sampling options
- add hidden --echo option
- remove ROCPROF_LIST_AVAIL_TOOL_LIBRARY from preload

* Add new rocprofv3 tests for specify the ATT library path

* Tweak to rocprofv3-test-hsa-multiqueue-att tests

* Update rocprofv3 tool to enable output with att

* Fix standalone test installation

* Revert to fetchcontent_makeavailable to fetchcontent_populate

* Revert tests/common/CMakeLists.txt

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-02-11 18:10:48 -06:00
Madsen, Jonathan 6246ec4040 SDK: Agent UUIDs, agent runtime visibility, kernel symbol address (#154)
* [DO NOT MERGE] Misc UUID updates

- this is WIP

* Agent visibility

- Support for ROCR_VISIBLE_DEVICES, HIP_VISIBLE_DEVICES, CUDA_VISIBLE_DEVICES, GPU_DEVICE_ORDINAL

* Update CHANGELOG

* tweak to rocprofiler_agent_runtime_visiblity_t

* Code object kernel address

- new fields in code_object_kernel_symbol_register_data_t
  - kernel_code_entry_byte_offset
  - kernel_address

* Support ROCR_VISIBLE_DEVICES reordering devices for HIP

* Addressed code review changes

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-02-11 14:36:23 -06:00
Madsen, Jonathan 3071199386 rocprofv3: do not abort if counter does not have dimensions (#150)
* rocprofv3: do not abort if counter does not have dimensions

* Relax error handling further in rocprofv3 metadata

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-02-11 14:31:25 -06:00
Kandula, Venkateshwar reddy 143f84fe6b [BUG FIX] store dimensions in counter id when used reduce operator (#181)
* save other dimension in counter id.

* Formating

---------

Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
2025-02-11 13:05:57 -06:00
Madsen, Jonathan 070b659a9a Re-enable clang-tidy for core workflows + clang-tidy fixes (#197)
* Ensure the clang-tidy is updated + clang-tidy fixes

* update-ci workflow

* Enable clang-tidy checks

* Add extra logging to device counter collection samples

* Misc clang-tidy fixes

* Disable device counter collection samples for ThreadSanitizer

* Formatting

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-02-11 10:58:47 -06:00
Bhardwaj, Gopesh cdf22eba7d Adding pc sampling how to guide (#160)
* Adding pc sampling how to guide

* doc update

* Fixing indentation

* updating index

* udpating doc

* updating doc

* Added field information

* Fixing Formatting

* fix formatting error

* Added json format for pc sampling

* feedback resolved

* formatting for text

* PC Sampling API doc

* Reformatted

* Note for shared systems

* update docs

* correcting relative path for cross-referencing

---------

Co-authored-by: vlaindic_amdeng <vladimir.indic@amd.com>
2025-02-10 20:33:05 -06:00
Welton, Benjamin 080b2ba451 [SWDEV-513658] Force HSA_AMD_MEMORY_POOL_EXECUTABLE_FLAG value to be used with HSA calls (#192)
* Force HSA_AMD_MEMORY_POOL_EXECUTABLE_FLAG  value to be used with HSA calls

Fix for CI

* More tweaks

* Increase reproducible-runtime kernel sleep granularity

* Fix data race in synchronous device counter collection sample

* Update device counting service

- add get_active_context function

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-02-10 11:34:26 -06:00
Indic, Vladimir e67a4451d8 Show host-trap configurations only (#194) 2025-02-10 11:32:53 -06:00
Elwazir, Ammar 5410fabd3d Fixing Clang tidy errors (#195)
* Fixing Clang tidy errors

* format-fix

* Update code_object.hpp

* Clang Tidy Fixes on the whole Source folder

* Update source/CMakeLists.txt

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>

* Addressing reviews

* Correcting the logic for parsing att counters

* Format Fix

* Update source/lib/rocprofiler-sdk-att/tests/dummy_decoder.cpp

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>

* Update source/lib/rocprofiler-sdk-att/tests/standalone_tool_main.cpp

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>

* Update source/lib/rocprofiler-sdk-tool/config.cpp

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>

* Formatting

* Deactivate clang-tidy in source/lib/rocprofiler-sdk-att/tests

---------

Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-02-07 22:33:32 -06:00
Bhardwaj, Gopesh 7821657d65 SWDEV-510794 Adding MPI usage with rocprofv3 (#183)
* swdev-510794 Adding MPI usage with rocprofv3

* update doc

* Fixed build issues

* updating doc

* doc update

* Fixed Typos

* csv format

* change format to shell
2025-02-07 12:01:31 +05:30
Madsen, Jonathan e743bf5a93 Undefined behavior warnings caught by ROCPROFILER_DEFAULT_FAIL_REGEX (#23)
* Add regex for undefined behavior to ROCPROFILER_DEFAULT_FAIL_REGEX

- add UBSAN_OPTIONS to setup-sanitizer-env.sh

* Improve ROCPROFILER_DEFAULT_FAIL_REGEX

* Use -fno-sanitize-recover=undefined flag

- this compiler flag causes all undefined behavior errors to exit

* Revert ROCPROFILER_DEFAULT_FAIL_REGEX

* fix for shift overflow

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Manjunath-Jakaraddi <manjunath.jakaraddi@amd.com>
2025-02-06 08:55:57 -06:00
Madsen, Jonathan 0fbe6cc7b6 SDK: No bg thread if no clients use SDK (#123)
* SDK: No bg thread if no clients use SDK

* Update CHANGELOG

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-02-06 08:34:56 -06:00
U, Srihari 90ae424c57 Initialize extremes to max and min values (#184)
* Initialize extremes to max and min values

* Address review comment

* Adding clang format
2025-02-06 08:32:37 -06:00
Nagaraj, Sriraksha 03e5a1d9cc remove duplication (#190) 2025-02-06 08:31:53 -06:00
Elwazir, Ammar 02a519e84e 6.4 fixes for HSA and HIP (#191)
* Adding support for hsa_amd_signal_wait_all

* Fixes for HIP

---------

Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
2025-02-06 07:56:08 -06:00
Bhardwaj, Gopesh 12508b9521 Adding ROCTx usage doc (#159)
* Adding Roctx usage doc

* updated CHANGELOG

* dpc update

* Fixing Related Pages issue

* updating doc

* updating docs

* Adding Resource naming section

* Fixed Formatting

* format fix

* format fix

* Fixing build due to incorrect indentation
2025-02-05 11:04:24 -06:00
Jakaraddi, Manjunath 9c89b475b0 SWDEV-506317: Kernel trace failing due to Code object errors (#170)
SWDEV-506317: Kernel trace failing
2025-02-04 18:01:42 -06:00
Elwazir, Ammar dd5c0ea257 Support new HIP APIs (#179)
* Adding New HIP APIs

* Format Fix

* Format Fix

* Removing changes from ostream and moving it to format

* Addressing Code Review Comments

* Versioning the new hip calls formatting

---------

Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
2025-02-04 15:50:18 -06:00
Welton, Benjamin 0c4a56c6bb [SWDEV-509876] Remove buffer requirement from device counting service (#132)
* [SWDEV-509876] Remove buffer requirement from device counting service

No longer require a buffer to be given when setting up device counting
service. This is to reduce performance overhead in cases where immediate
return of counting samples is being used (synchronous mode).

* Missed file

* Update source/include/rocprofiler-sdk/device_counting_service.h

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>

* Update source/lib/rocprofiler-sdk/counters/controller.cpp

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>

* Update source/lib/rocprofiler-sdk/counters/device_counting.cpp

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>

* Fixes for build

---------

Co-authored-by: Benjamin Welton <ben@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
2025-02-04 06:06:03 -06:00
Nagaraj, Sriraksha d4a51e4102 Adding att v3 support (#84)
* Adding att v3 support

* misc fix

* bug fix

* Python linting workflow and rules

* fix regex

* Adding temporary args

* fix temporary args

* fix format

* remove att_perfcounters from test input

* Review comments (#163)

Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>

* Revert "Review comments (#163)"

This reverts commit 9ef0f8e5a4489d5581255e1b70ced2aef5c1c1d0.

* Address review comments 2

* review changes

* review comments

* review

* cmake alias

* review

* review

* review

* review

* Enabling percounter in v3 script

* review

* formatting

* formatting

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Baraldi, Giovanni <Giovanni.Baraldi@amd.com>
Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>
2025-02-04 04:05:38 -06:00
Madsen, Jonathan 72a27feb04 Fix HSA_AMD_MEMORY_POOL_EXECUTABLE_FLAG for ROCm < 6.3 (#178)
Fix HSA_AMD_MEMORY_POOL_EXECUTABLE_FLAG for ROCm < 6.4

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-02-04 04:05:19 -06:00
Madsen, Jonathan 7fcd80f744 Fix async memory copy validation tests (#182)
* Fix async copy validation test

- make the async copy tracing test work regardless of however many HSA memory copies the HIP memory copy decomposes into

* Fix rocprofv3 memory copy tests

* Fix compilation support for hipGraphBatchMemOpNodeGetParams

* Fix rocprofv3-test-summary-*-validate

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-02-04 02:41:19 -06:00
Madsen, Jonathan f3752faa0a Update HIP string formatting for ROCm 6.4.0 (#144)
Fix HIP data type stringify

- when ROCPROFILER_CI is not defined, provide default for case statements
- Add support for hipGraphNodeTypeBatchMemOp when HIP version is >= 6.4.0

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-01-30 20:51:52 -06:00
Kandula, Venkateshwar reddy 121901c321 add gfx12 for counter collection tests (#108)
* add gfx12 for counter def.

* Update continuous_integration.yml

* Update counter_defs.yaml

* commenting logging.

* Update ioctl.cpp

* add gfx12 to tests

* Update ioctl.cpp

* Add description to GFX12 GL2C_EA_RDREQ counter

* Updates from editor

---------

Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
Co-authored-by: Kuricheti, Mythreya <Mythreya.Kuricheti@amd.com>
2025-01-30 15:16:48 -06:00
Madsen, Jonathan 1f49d6c57b Partial fix of legacy rocprofiler project name (#110)
* Partial fix of legacy rocprofiler project name

* Formatting fix

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-01-30 13:43:35 -06:00
Kuricheti, Mythreya d43070bf08 Fix navi48 counter event IDs (#158)
* Initial fix for navi48 counters

* Add GL2C navi4x gfx12 counters
2025-01-30 13:40:25 -06:00
Baraldi, Giovanni 39db6d842f Fix for ATT context stop while packets are being processed (#171)
Fix for context stop while packets are being processed

Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>
2025-01-29 11:06:32 -08:00
Welton, Benjamin 0d701cdaac [SWDEV-482060] Set execute permission for HSA allocated memory (#151)
We need execute permission for HSA memory (req for IB buffers).
Enforcement is upcoming which will break counter collection (see
ticket).

Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Bhardwaj, Gopesh <Gopesh.Bhardwaj@amd.com>
2025-01-28 16:46:22 -08:00
Indic, Vladimir e4d736839d Temporarily allow only host-trap sampling (#156) 2025-01-27 13:26:11 -06:00
Elwazir, Ammar 19a912d476 Fixing collection period rocprofv3 help message (#148)
Update rocprofv3.py
2025-01-24 08:39:40 -06:00
Rawat, Swati 97b7a6315d update copyright date to 2025 (#102)
* Update LICENSE

* Update conf.py

* Update copyright year

* [fix] Update copyright year

* Update copyright year "ROCm Developer Tools"

* Add license headers to c++ files

* Add license to *.py

* Update licenses in rocdecode sources

---------

Co-authored-by: srawat <120587655+SwRaw@users.noreply.github.com>
Co-authored-by: Mythreya <mythreya.kuricheti@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-01-22 19:11:20 -06:00
Meadows, Lawrence 93f19cf5ca Add ELFABIVERSION_AMDGPU_HSA_V6 emitted by recent llvm compilers (#141)
Co-authored-by: Larry Meadows <lmeadows@amd.com>
2025-01-22 13:42:48 -08:00
Bhardwaj, Gopesh 73e7f8cfb1 ROCTx Documentation (#29)
* Add roctx doc

* Add roctx doxyfile input

* Update links and toc

* Build doxysphinx for both doxygen files

* Update scripts

* Generate roctx doxygen files

* Change doxygen path

to allow for 2 doxyfiles

* Make doxygen dir for script

* Call make _doxygen dir with p flag

* Create _doxygen dir in workfllow

* Create doc dirs for doxygen

* Run update docs as sudo

* Fix typo in mkdir command

* Include graphviz for dot

* Install dot for docs CI

* Install dot as sudo due to permission denied

* Install doxygen via sudo

* Install doxysphinx

* Add postcheckout step to RTD to config and gen doxygen docs

* On RTD, update doxygen after creating env

* update docs.yml

* update docs.yml

* fixing build-docs-from-source

* Fixing build docs from source

* update docs.yml

* trying to fix readthedocs

* trying to fix readthedocs

* update docs.yml

* improve mainpage documentation

* update docs

* clang-format fix

---------

Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
2025-01-22 11:34:21 -06:00
Trowbridge, Ian 73e72bb088 Documentation Update to reflect that memory allocation trace records null pointers for free operations (#127)
Update documentation to reflect that nullpointers can be recorded in free memory operations
2025-01-22 11:20:50 -06:00
Cheruvally, Aravindan 6bb60bf930 Enhance CMAKE install instructions with std install location/destination (#85)
* Enhancement - usage of package name flags commonly across for getting unique folder name

* Enhancements - updating libexec/pkg usage, avoid sbin

* CMAKE Format Update

* Python Format Update

* Revert "Enhancement - usage of package name flags commonly across for getting unique folder name"

This reverts commit 2dcd1ac5f22ab90112d90648e4b5dab5c54bc639.

* REview Comments - Revert PACKAGE_NAME usage

* Review Comments - Update source folders accordingly to new cmake install locations
2025-01-22 11:19:47 -06:00
Madsen, Jonathan 89cfb5317d Update docs jinja requirements (#118)
- Jinja < 3.1.5 has a sandbox breakout through malicious filenames
- Jinja < 3.1.5 has a sandbox breakout through indirect reference to format method

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-01-22 11:18:19 -06:00
Trowbridge, Ian 042c761b64 Changed memory_allocation.csv name to memory_allocation_trace.csv (#111) 2025-01-22 11:14:42 -06:00
Welton, Benjamin 3076660e60 Add gfx941/gfx942 to ValuPipeIssueUtil (#139) 2025-01-21 14:39:03 -08:00
Welton, Benjamin c6b52701c7 Add gfx940/gfx9 to ValuPipeIssueUtil (#138)
Was dropped, likely by mistake, in the transition to yaml
2025-01-21 12:12:39 -08:00
Baraldi, Giovanni 081419b745 Fix throw on repeated filename (#124)
Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>
2025-01-20 17:24:28 -08:00
Trowbridge, Ian e307b89ca4 rocDecode API Tracing Support (#49)
* rocDecode API Tracing support

* Test bin file added to rocdecode. Need to add validate python methods

* Added option to not make rocDecode tests

* Added rocdecode and rocprofv3 tests

* Added csv test

* Address PR comments. Changed tests to use built-in rocstreambit decoder to remove ffmpeg dependancy. Changed cmake option to disbale tests rather than not build them. Tests work locally, but will fail until rocDecode is built with tracing enabled on CI

* Add option to avoid building rocdecode tests

* Added option to avoid building rocdecode bin file

* Merge conflict error

* CMake files changed in response to review comments. Attempting to implement callbacks.

* Turned off test building for rocdecode

* Minor fixes for review comments

* Review comments

* Updated formatting

* Document changes and format.hpp reversion. Need to remove iterate args support for now for later update.

* Remove iterate args support

* Remove iterate-args

* enforce abi versioning in macro if

* Fix doc error

* removed spaces to fix indentation error

---------

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
2025-01-17 14:42:25 -08:00
Baraldi, Giovanni 1f01526eed SWDEV-478762+FEAT-62196: Fix crash on AQL replay (#104)
* SWDEV-478762: Fix crash on replay

* Fix iteration range

* Format

* Refactor

* Addressing review comments

* Address review comments

* Formatting

* Format

* Refactor

---------

Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>
2025-01-17 12:23:48 -06:00
Jakaraddi, Manjunath dfee6489b1 SWDEV-500520: Updated documentation for hang issue (#79)
* SWDEV-500520: Updated documentation for hang issue

* Avoid fatal error when invalid metric is found

* removing invalid metrics

* clang formatting
2025-01-16 02:14:22 -08:00
Welton, Benjamin 536fbba627 [SWDEV-509659] Skip rocprof device counting tests if lacking permissions (#125)
* [SWDEV-509659] Skip rocprof device counting tests if lacking permissions

Skips non-intercept test if proper permissions are not obtained
(SYS_PERFMON). This should be the only test that fails due to permission
issues (others do not require the IOCTL to pass).

Regex match sample: https://regexr.com/8b29s

* Update source/lib/rocprofiler-sdk/counters/tests/CMakeLists.txt

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>

* fix

* Update source/lib/rocprofiler-sdk/counters/tests/CMakeLists.txt

---------

Co-authored-by: Benjamin Welton <ben@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
2025-01-16 01:51:06 -06:00
Welton, Benjamin 71dc203b0c Small debug print fix in ioctl.cpp (#120)
* Small debug print fix in ioctl.cpp

Fix debug print statement to print agent id.

* formatting

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-01-15 18:00:04 -08:00
Madsen, Jonathan fae4ad614c Fix host function logging (#63)
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-01-15 19:54:35 -06:00