Граф коммитов

64483 Коммитов

Автор SHA1 Сообщение Дата
Jonathan R. Madsen 9278770b89 [rocprofiler-sdk] ROCpd GOTCHA Fix (#720)
* Update GOTCHA submodule

- public API for gotcha_init
- switch repo to ROCm/gotcha

* rocpd interop GOTCHA updates

- fix issues wrapping dlopen/dlsym
2025-09-23 10:45:56 -05:00
xuchen-amd 68cd123b0f [rocprofiler-compute][TUI] improve for cross-platform uses (#1007) 2025-09-23 10:59:29 -04:00
Ioannis Assiouras 97fc90c58f SWDEV-556250 - Added synchronization before validating the result in Unit_hipStreamLegacy* tests (#1062) 2025-09-23 14:14:51 +01:00
Kian Cossettini b2a026f134 Increase timeout for openmp-vv ctests (#1083)
- Set `SAMPLING_TIMEOUT` and `REWRITE_TIMEOUT` to 300 seconds for `openmp-vv` ctests.
2025-09-23 07:45:56 -04:00
systems-assistant[bot] 1e9d8abbf6 [rocpd] Convert to perfetto does not display scratch_memory correctly - SWDEV-542550 (#168)
Add scratch memory to pftrace generated with rocpd

----

Co-authored-by: Marko Crnobrnja <Marko.Crnobrnja@amd.com>
Co-authored-by: Aleksei Tumakaev <atumakae@amd.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
2025-09-23 09:55:30 +02:00
Ajay GunaShekar 93cfcb1a4e SWDEV-556658 - skip failing tests to have a clean build (#1082)
SWDEV-556658 - Linux failures
SWDEV-556645 - Windows failures
2025-09-22 20:49:18 -07:00
Ioannis Assiouras 97bc3af918 SWDEV-550882 - Add support for hipIpcMemLazyEnablePeerAccess (#817) 2025-09-23 00:05:51 +01:00
Jason Bonnell 8b52d71cc7 rocprofiler-systems - add gfx containers to ghcr (#883)
* Initial skeleton code for rocprofiler-systems-continuous-integration.yml

* Add python3-devel to opensuse and rhel ci images

* Update rocprofiler-systems-containers.yml to include TheRock tarballs

* Update pip install command for Dockerfile.ubuntu.ci

* Fix pip install again for Dockerfile.ubuntu.ci

* Remove skeleton workflow for CI

* Add new ci-gfx containers for TheRock installs

* Add set -e and pipefail to ci Dockerfiles to detect errors

* Upgrade pip in Dockerfile.ubuntu.ci

* revert pipefail set -e change

* Replace build-docker-ci.sh script with Docker step for ci-base

* Add support for gfx950, add containers-ci-gfx.yml

* Add working-directory to matrix setup steps

* Try changing containers-ci-gfx.yml

* make more changes to containers-ci-gfx.yml

* Remove build-docker-ci.sh script from gfx step, fix typo in Dockerfile

* Remove gfx110X and gfx120X for now

* Update ci-gfx docker workflow to use ghcr.io

* Temporary change to test one image

* Enable push to test out ghcr package

* Add labels to debug oauth issue

* add pacakages permissions to step

* add rocprofiler-systems-ghcr.yml workflow

* Remove cache from Docker push action step

* Add prefix to tag

* Add back gfx94X and gfx950 support, add back no push on PR

* Remove gfx container creation from rocprofiler-systems-containers.yml

* Add a gfx950 image for now

* Revert change
2025-09-22 16:58:55 -04:00
Jason Bonnell 9d90286371 rocprofiler-sdk CI workflow improvements (#956)
Update rocprofiler-sdk and aqlprofile CI workflows to improve readability
2025-09-22 16:47:16 -04:00
Mythreya Kuricheti 09c0470ed4 [aqlprofile] Add verison info to public header (#706)
Add versioning information to public aqlprofile headers, and add API to query version at runtime
2025-09-22 11:02:42 -07:00
Venkateshwar Reddy Kandula a4effb81a9 [rocprofiler-sdk][CI] install libva-amdgpu-dev in requirements CodeQL Job (#1038)
* install libva-amdgpu.

* Add rocprofiler-sdk-codeql.yml to paths

* Update rocprofiler-sdk-codeql.yml

* update requirements for rocm_release_compatibility job.

* address comments.

---------

Co-authored-by: Venkateshwar Reddy Kandula <venkateshwar.kandula1306@gmail.com>
Co-authored-by: jbonnell-amd <jason.bonnell@amd.com>
2025-09-22 12:17:03 -05:00
cfallows-amd 9819e1cbfc Refactor roofline binary detection (#933)
* Simplify the roofline binary pickup process by determining which base distribution the system OS is based off of, and select the correct binary.
* Add more OS distribution support to roofline by modifying the detection parameters and adding an AZL binary
* Update changelog to include roofline support additions

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
2025-09-22 12:04:20 -04:00
Ajay GunaShekar 0118184d22 SWDEV-554678 - Navi44 on windows (#936)
* SWDEV-554678 - Navi44 on windows

* SWDEV-554678 - Navi44 in palsettings
2025-09-22 08:52:41 -07:00
systems-assistant[bot] 05c0a38732 SWDEV-508776 - Validate VGPRs (#620)
Add a test to verify VGPRs.
Make hipInfo show maxAddressableVgprsPerThread.

Change-Id: Ibfc2c912a54ccd1686a3930a1008c472a8465136

Co-authored-by: taosang2 <tao.sang@amd.com>
2025-09-22 11:28:05 -04:00
David Yat Sin b7095616b9 fix: fix -Wunused-parameter (#1017)
Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>
2025-09-22 11:12:44 -04:00
vedithal-amd fa31650298 [rocprofiler-compute] make pmc files deterministic (#1066)
* make pmc files deterministic
2025-09-22 10:48:35 -04:00
itrowbri 9910705685 Disable validation tests when execution test is disabled (#1060) 2025-09-22 09:18:32 -05:00
Sunday Clement f3e1db176a rocrtst: Reduce host memory limit to 70% (#905)
* rocrtst: Reduce host memory limit to 70%

Reducing the upper bound for rocrtstFunc.Memory_Max_Mem to 70% from
90% to help reduce test execution time.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* rocrtst: Add ROCRTST_LIMIT_POOL_SIZE env var

Add environment variable to override the memory pool sizes when running
tests.

Co-authored-by: David Yat Sin <David.YatSin@amd.com>

---------

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Co-authored-by: David Yat Sin <David.YatSin@amd.com>
2025-09-22 09:39:00 -04:00
Shadi Dashmiz 9b350754cc SWDEV-555084: Fix the python script (#996)
- no need to manually updated the newly generated hip_prof_str.h

Signed-off-by: shadi <shadi.dashmiz@amd.com>
2025-09-22 08:41:19 -04:00
abchoudh-amd a927f246f6 Fix test failures (#1059)
* Test fix

* Added path not exists check
2025-09-22 18:07:13 +05:30
systems-assistant[bot] 63a723a287 GFX12 PC Sampling support (#186)
The GFX12 host-trap PC sampling support in SDK and V3.
Introducing parser tests specific to GFX12.

Co-authored-by: vlaindic_amdeng <vladimir.indic@amd.com>
2025-09-22 13:17:00 +02:00
Venkateshwar Reddy Kandula 997b36f5bc [rocprofiler][navi4] Remove navi4x support on rocprofv2. (#307)
* Remove navi4x support on rocprofv2.

* remove gfx12 from build scripts.

* bug fix.

* address comments.

* update changelog

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* address comments

Co-authored-by: Swati Rawat <120587655+SwRaw@users.noreply.github.com>

---------

Co-authored-by: Venkateshwar Reddy Kandula <venkateshwar.kandula1306@gmail.com>
Co-authored-by: Swati Rawat <120587655+SwRaw@users.noreply.github.com>
2025-09-22 03:17:29 -05:00
MachineTom 25922d08c3 SWDEV-539145 - Return error when ext_fine_grain_pool unavailable (#877)
Return error when ext_fine_grain_pool is unavailable for
hipHostMallocUncached, hipHostAllocUncached and
hipExtHostRegisterUncached.
Disable related tests on Navi4x where
ext_fine_grain_pool is unavailable
2025-09-21 19:25:28 -04:00
MachineTom c6c2fa212c SWDEV-1 Fix a bug of VGPRs (#1000)
Fix a bug of VGPRs due to a previous patch:
SWDEV-546223 - Get image support info from ISA meta
2025-09-21 19:23:12 -04:00
systems-assistant[bot] 69d96d9e0a SWDEV-491267 - add stream capture test for Semaphore APIs (#572)
Co-authored-by: Li, Todd tiantuo <Toddtiantuo.Li@amd.com>
2025-09-21 15:13:30 -07:00
hkasivis 5e7210980e Users/hkasivis/add ais support v2.1 (#928)
* libhsakmt: Update hsakmt_fmm_get_handle to support address range

Currently, hsakmt_fmm_get_handle works only if the address is allocated
(staring) value. Update it so it can find the handle if address falls in
the valid allocated range. This is useful for AMD infinity storage
feature where data needs to be transferred to any memory within in the
allocated range

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

* libhsakmt: Introduce AMD Infinity Storage (AIS) API

Add hsaKmtAisReadWriteFile() API to support AMD Infinity Storage. The
API moves data directly from GPU VRAM to a file.

v2: Add in/out ioctl arguments to provide more status information to
user space. Modify hsaKmt API also accordingly.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

* rocr: Initial implementation of AMD Infinity Storage (AIS)

Implement first two API: hsa_amd_ais_file_write and hsa_amd_ais_file_read

v2: Change API from hsa_amd_ to hsa_amd_ais_
    Change API to take in handle instead of fd for compatibility accross
     different platforms

Original Author: Chris Freehill <Chris.Freehill@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

---------

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2025-09-20 11:30:05 -04:00
Todd tiantuo Li 7137c7f3d8 SWDEV-541478 - return hipSuccess for hipTexObjectCreate TypePitch2D with zero width or height (#712) 2025-09-19 20:48:01 -07:00
Stella Laurenzo 2e93b9f6cb [clr] Only enable comgr dynamic loading if it is a shared lib. (#1065)
Prior we were enabling dynamic loading mode if BUILD_SHARED_LIBS, but this is not correct. We should only be loading dynamically if the amd_comgr library itself is shared.

Background: we have a configuration where we use a static linked comgr stub in order to achieve LLVM isolation (it dynamically loads the comgr and compiler into a dedicated link namespace) in an otherwise dynamic linked clr.
2025-09-19 16:10:15 -07:00
systems-assistant[bot] 3cfdfe30b2 SWDEV-544502 - Skip opengl tests on devices with no image support (#542)
Co-authored-by: Ioannis Assiouras <Ioannis.Assiouras@amd.com>
2025-09-19 22:51:55 +01:00
Jatin Chaudhary e79eaaa8a5 SWDEV-546287 - Implement hipLibrary load/unload (#975) 2025-09-19 22:23:49 +01:00
ywang103-amd 775ac73d25 change interval for host_trap in unit test to adapt to single kernel (#1064) 2025-09-19 17:21:02 -04:00
Venkateshwar Reddy Kandula ec4d4b8a0d add SetGRBMToBroadcast in sqtt_builder.h (#1061) 2025-09-19 15:56:55 -05:00
Venkateshwar Reddy Kandula d16e7adf13 [rocprofiler-sdk][CI] Nightly build testing for rocprofiler-sdk (#949)
* Implement nightly tests mode

* Update run-ci.py
2025-09-19 14:32:11 -05:00
Ajay GunaShekar 300cd09f0b SWDEV-555665 - skip failing tests on window (#1045)
* SWDEV-555665 - skip failing tests on window

* SWDEV-555665 - skip failing tests on window
2 missed tests added to the skip list

* SWDEV-555665 - typo in Unit_test_generic_target_in_regular_fatbin testname
2025-09-19 12:24:47 -07:00
JonathanLichtnerAMD f31afe1d20 [HIP CLR] Make hipMemPtrGetInfo consistent with malloc and hipMalloc (#1005)
hipMemPtrGetInfo was returning the error hipErrorInvalidValue if it
was called on a nullptr.  However, this does not match the malloc
convention where a nullptr has size zero;  for example,
malloc_usable_size() returns zero if called on a nullptr.

This commit changes hipMemPtrGetInfo to set the size to zero and
return hipSuccess when called with a nullptr.  (This also fits with
hipMalloc and hipFree usage, since hipMalloc of size zero results in a
nullptr, and hipFree of a nullptr is successful.)
2025-09-19 12:53:41 -06:00
Julia Jiang 1c10592be2 SWDEV-546376 - Fix CTS profiling failure (#976) 2025-09-19 13:38:28 -04:00
Tony G c34c9826c3 rocr: Remove QueueProxy (#700)
Because the base QueueWrapper class copies the wrapped queue's
amd_queue_v2_t queue descriptor struct the QueueProxy seems
superfluous as it will have the same effect as calling the
underlying methods on the wrapped queue itself.

Additionally, because the QueueProxy needs to access the wrapped
queue's queue descriptor it breaks the Queue API which is meant
to abstract the underlying agent's queue implementation.

This makes it easier to generalize the core::Queue as well as
the InterceptQueue.

Signed-off-by: Tony Gutierrez <anthony.gutierrez@amd.com>
2025-09-19 09:07:28 -07:00
German Andryeyev ea89ddd589 SWDEV-547108 - Add dll loader for Windows build (#1004)
The build of ROCR backend will be enabled by default in Windows.
It requires the dll loader until ROCR dll will be always available in Windows for any configuration.
2025-09-19 11:25:30 -04:00
jamessiddeley-amd 18b4b84a9f update TCC_EA0_RDREQ_sum def (#1039) 2025-09-19 11:21:45 -04:00
systems-assistant[bot] 3f001b0305 [rocpd] Refactor to use python to convert rocpd to CSV + add CSV tests + remove old cpp implementation (#159)
* Write agent info to CSV

* Write kernel to CSV

* Write memory copy to CSV

* Write memory allocation to CSV

* Write hip api to CSV

* Write hsa api to CSV

* Write marker api to CSV

* Write counters to CSV

* Write scratch memory to CSV

* Write rccl api to CSV

* Write rocdecode api to CSV

* Write rocjpeg api to CSV

* Remove info_process joins

* Format agent id

* Compose full file name is sql writer function

* Add missing fields to kernel traces csv

* Rename vgpr_count to arch_vgpr_count

* Fix kernel name

* Skip empty query results

* Format csv.py

* Delete c++ CSV writer

* Add CSV header comparison test

* Fix comment spacing in csv.py

* Change ALLOC to ALLOCATE in memory allocation writer

* Do not append trace to agent info file name

* Revert changes for VGPR_Count

* Fix csv validation test

* Add sorting by guid

* Use EXISTS to check query results are not empty

* Merge API-specific queries

* Optimize regions query

* Column name mapping for agent info

* Pass config to sql writer

* Move agent id string building to a separate function

* add titled_headers argument

* Remove titled-columns argument

* Improvements for regions csv

* fix CSV validation test

* improve CSV validation test

* remove roctxMarkA from csv validation test

* fix capability field titles in agent info

* remove filter.py from query as that is still experimental

* Remove some aliases, now that query will auto-title the column headers

---------

Co-authored-by: Aleksei Tumakaev <atumakae@amd.com>
Co-authored-by: Young Hui <young.hui@amd.com>
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
2025-09-19 10:15:57 -04:00
Ioannis Assiouras 9def133275 SWDEV-555798 - Added sync after graphLaunch for Unit_hipGetProcAddress_GraphAPIs tests (#1052) 2025-09-19 15:10:33 +01:00
German Andryeyev 913743d433 Add windows build support into ROCr (#912)
Make sure ROCR can be compiled under windows. Extra setup for the windows build environment is required. The change should not have any functional changes under Linux.
2025-09-19 10:10:17 -04:00
David Yat Sin 96a0d16eda rocr: Fix hsa_amd_pointer_info regression (#719)
Fix for hsa_amd_pointer_info returning only
HSA_EXT_POINTER_TYPE_RESERVED_ADDR for SVM allocations.
2025-09-19 10:09:22 -04:00
Jason Bonnell eebf5ead8c Replace cmake-format with gersemi in rocprofiler-compute-formatting.yml (#1053)
* Replace cmake-format with gersemi in rocprofiler-compute-formatting.yml

* Run gersemi formatting on CMakeLists.txt files

* Remove .cmake-format.yaml, add .gersemirc file

* Add more options to .gersemirc

* Add new line to .gersemirc

* Add new line to CMakeLists.txt

* Run gersemi again with new options
2025-09-19 08:42:40 -04:00
Godavarthy Surya, Anusha 538528d1e5 SWDEV-548417 - Fix Memleaks in Graph (#973)
Command enqueued on the graph internal stream are not released add stream during graphExec release

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2025-09-19 17:45:01 +05:30
Godavarthy Surya, Anusha ce560304a8 SWDEV-548417 - Fix Memleaks in Graph (#713)
Co-authored-by: Anusha GodavarthySurya <Anusha.GodavarthySurya@amd.com>
2025-09-19 17:39:36 +05:30
Jaydeep 9f5b390db4 SWDEV-555484 - getQueueId uses hsa_queue's id which is not necessary to be bound by GPU_MAX_HW_QUEUES and hence accessing array beyond size cause data curruption. (#1040) 2025-09-19 14:31:27 +05:30
Jaydeep 99613f1009 SWDEV-555484 - Invalidate capturing stream only for null/legacy stream. (#1032) 2025-09-19 14:31:17 +05:30
Gopesh Bhardwaj 470b7d7ccd SWDEV-553065 palamida scan fix (#1010) 2025-09-19 01:01:12 -04:00
Mark Meserve bf49039005 [rocprofiler-sdk][rocprofiler-register] Initial Attachment Support (#316)
* attach: milestone: API tracing

- This pairs with another commit in rocprofiler-sdk to fully
  function
- Add ptrace entry points for tool attachment
- API tracing works at this commit
- Queue tracing not supported yet

* attach: cleanup

- Remove hardcode for loading of tool library
- Make invoke registration functions public again

* attach: proxy queue first draft

- Adds ability to trace with queues during attachment
- Must be paired with updated rocprofiler-sdk

* attach: prestore overhaul

- Must be paired with commit in rocprofiler-sdk

* attach: add dispatch table rework

- Register will load the prestore library and provide entrypoints to sdk

* attach: formatting and cleanup

* attach: revise dispatch table scheme

* attach: formatting

* attach: milestone: API tracing

- This change must be paired with a change in rocprofiler-register to
  fully function.
- API tracing works at this commit
- Queue tracing not supported yet

* attach: cleanup and comments

* attach: Formatting and crash fixes

* attach: add attach duration

- Add option attach-duration-msec for attachment

* Formatting + sglang hang fix via signal handling

* Changed FATAL_IF to DFATAL_IF for scratch_memory due to persistent crash when iterating queues

* attach: proxy queue first draft

- Adds ability to trace with queues during attachment
- Must be paired with updated rocprofiler-register

* Allow null agents for scratch output

* attach: improve queue library interface

- Significant changes to force exported interfaces back to C
- Fixes bug with unknown agents at attachment
- Code objects' names may still be incorrect

* attach: add code_object support

- Kernel traces will now have names and all other information for launches
- Add capture of hsa_executable to the queue library
- Various logging improvements

* attach: rename queue library to prestore

* attach: prestore overhaul

- Must be paired with commit from rocprofiler-register
- Massive overhaul of code organization in prestore library
  - Separates registrations for different object types
  - Sets up future changes for initialization

* attach: add prestore dispatch table

- Removes linkage to prestore library from sdk

* attach: cleanup

* attach: formatting

* attach: fix input prompt not appearing

* attach: fix component name in cmake

* attach: revert change to export level

* Make prestore API public

* attach: update sdk attachment library WIP

- This commit is NONFUNCTIONAL

- Changes around structure to remove classes
- Seperate C linkage where needed
- Still needs updates to register for correct usage

* attach: update register with dispatch table WIP
- This commit is NONFUNCTIONAL

- Changes rocprofiler_register to handle dispatch table from attach
  library.
- Still needs changes in SDK with dispatch table usage

* attach: dispatch table wip
- This commit is NONFUNCTIONAL

* attach: move attach component into core

* attach: rename to rocprofv3-attach

* attach: add callbacks for new queues and code objects

* attach: finish dispatch table implementation

- Fixes kernel tracing

* attach: add cmake variable for attachment support

* feat: Add --attach alias for rocprofv3 with comprehensive attachment tests

- Add `--attach` as an alias to existing `-p/--pid` functionality in rocprofv3.py
- Create comprehensive attachment test suite with CSV and JSON output validation:
- New attachment-test application for testing dynamic profiling scenarios
- Unified test script supporting both CSV and JSON output formats
- Pytest-based validation for kernel traces, memory copies, HSA API calls, and agent info
- Add CMake integration for automated attachment testing
- Support parameterized output directory and filename specification
- Implement proper environment setup for attachment queue registration

Tests verify successful attachment to running processes and capture of:
- Kernel dispatch traces with workgroup/grid dimensions
- Memory copy operations (H2D/D2H) with size validation
- HSA API call traces across multiple domains
- GPU/CPU agent information and capabilities

* Documentation Update

* attach: make attach script callable

* Added ROCPROFILER_REGISTER_ATTACHMENT_TOOL_LIB to remove hardcoded name

* attach: revert metrics library path changes

* Generic Attachment in Register (#942)

Remove tool references in register

* Add second param to attach call in rocprof register

* Add experimental reattachment support for ROCprofiler-SDK

This commit introduces experimental reattachment functionality allowing tools
to dynamically reattach to running processes with comprehensive design changes
to support multiple attach/detach cycles:

**Core Reattachment API:**
- Add rocprofiler_tool_configure_result_experimental_t with tool_reattach/tool_detach callbacks
- Add rocprofiler_call_client_reattach and rocprofiler_call_client_detach C exports
- Implement reattachment tracking in rocprofiler_register_attach to differentiate
initial attachment from reattachment cycles
- Add rocprofiler_register_invoke_reattach for handling reattachment requests

**Design Changes - Registration System Flow:**
The registration system now supports a dual-path initialization:

1. Initial Attachment Flow:
    - rocprofiler_register_attach() -> rocprofiler_register_invoke_all_registrations()
    - Full tool initialization with complete context setup
    - Sets prev_attached atomic flag to track state

2. Reattachment Flow:
    - rocprofiler_register_attach() detects prev_attached=true -> rocprofiler_register_invoke_reattach()
    - Bypasses full re-initialization, calls client reattach callbacks instead
    - Preserves existing contexts and buffers, only reactivates profiling services

**Design Changes - Tool Library Loading:**
Enhanced rocprofiler-register library loading with function pointer resolution:
- Extended rocp_set_api_table_data_t tuple to include reattach/detach function pointers
- Automatic symbol resolution for rocprofiler_call_client_reattach/detach functions
- Support for both LD_PRELOAD and dlopen scenarios with consistent callback availability

**Design Changes - Context Management:**
Introduced dual context systems for attachment scenarios:
- get_contexts() - Original contexts for standard tool initialization
- get_attach_contexts() - Separate context map for attachment-specific lifecycle
- attach_init() - Creates contexts for ALL buffer tracing services using existing buffers
- attach_start() - Selectively starts contexts based on configuration options
- attach_detach() - Cleanly stops and destroys attachment contexts

**Design Changes - Buffer Management:**
Added reset_tmp_file_buffer() template for clean reattachment state:
- Properly closes and removes old temporary files
- Deletes existing file_buffer instances to prevent stale file position tracking
- Creates fresh file_buffer instances for clean reattachment cycles
- Addresses core issue where file position metadata becomes stale between cycles

**Design Changes - Environment Variable Injection:**
Added ROCP_REGISTERED_TOOL_ATTACH environment variable:
- Distinguishes attachment-loaded tools from LD_PRELOAD scenarios
- Enables registration system to apply attachment-specific logic
- Helps tools adapt behavior for attachment vs standard initialization

**Attachment Context Management:**
- Add attach_init/attach_start/attach_detach functions for dynamic context lifecycle
- Add reset_tmp_file_buffer template for clean reattachment state management
- Implement get_attach_contexts() for tracking active attachment contexts

**Test Infrastructure:**
- Add projects/rocprofiler-sdk/tests/rocprofv3/reattach/ comprehensive test suite
- Include reattachment test scripts with unified attachment/detachment cycles
- Add validate.py with trace data validation for kernel, memory copy, HSA API, and agent info
- Add conftest.py for JSON and CSV data loading utilities

**Configuration Updates:**
- Update CMakeLists.txt to include reattachment tests in build system
- Add environment variable ROCP_REGISTERED_TOOL_ATTACH for attachment state tracking
- Enhance rocprofiler-register library loading with reattach/detach function resolution

**Flow Impact Analysis:**
This design enables robust multi-cycle attachment by:
1. Preventing duplicate initialization on reattachment
2. Maintaining separate context lifecycles for attachment vs standard operation
3. Ensuring clean temporary file state between attachment cycles
4. Providing tools with explicit reattach/detach callback hooks
5. Supporting both programmatic and environment-based tool configuration

The experimental nature allows for iteration on the API while establishing
the foundation for production-ready dynamic profiling capabilities.

* Fix misc clang-tidy warnings/errors

* CMake Option and Environment Variable Updates

- CMake: ROCPROFILER_REGISTER_ALWAYS_SUPPORT_ATTACH -> ROCPROFILER_REGISTER_BUILD_DEFAULT_ATTACHMENT
- Env: ROCPROFILER_REGISTER_ATTACHMENT_ENABLED ->

* Source reorganization

* Formatting + new lines at EOF

* Fix flake8 F841: local variable is assigned to but never used

* Update attachment test

- get rid of 5 second start delay
- add roctx

* Rework implementation

- Remove rocprofiler_tool_configure_result_experimental_t in lieu of rocprofiler_configure_attach
- Add <rocprofiler-sdk/experimental/registration.h>
- TODO: Update process_attachment.rst

* Handle re-attachment options

- inherit options from previous attachment
- check previous options do not modify data collection services

* Fix support for tools w/o rocprofiler_configure_attach

- fix segfault when rocprofiler_configure_attach does not exist
- fix naming convention for functions accepting attach dispatch table
- cleanup rocprofiler_configure_attach implementation in rocprofv3 tool

* attach: remove unknown agent handling

- Change was from earlier commit, no longer needed

* attach: add error for attaching without library loaded

* attach: revise version numbering

* attach: register header revisions

* attach: clang format register

* attach: formatting

* attach: fix build failure

- Remove cross dependency into rocprofiler-sdk, fixes build on some systems

* attach: revise register library detection

* Update rocprofiler-register and attach library

- formatting
- proper signature of register_functor for rocprofiler-sdk-attach library callback
- remove get_dispatch_registration_table()

* Bump rocprofiler-register version to 0.6.0 + AnyNewerVersion

* Fix output support for rocprofiler-sdk-tool

* Fix formatting

* Fix clang tidy errors

* Misc rocprofiler-sdk-attach fixes

* attach: add sigint handling to attach python

* tool README.md formatting

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>

* Fix buffered output issue

* attach: add errors for tool attach

* CI Fixes

* Rework tests

* attach: improve library loading in rocprofv3 attach

* formatting

* Update tests to use pytest framework

* Fix test_attachment_hsa_api_trace

* attach: catch ctypes exceptions

* attach: fix leak in registration

* attach: fix sanitizer tests

* attach: fix sanitizer tests further

* attach: disable attach asan tests

* attach: disable ubsan test

* attach: fix permissions in installed test package

* attach: formatting

---------

Co-authored-by: Ian Trowbridge <Ian.Trowbridge@amd.com>
Co-authored-by: Tim Gu <Tim.Gu@amd.com>
Co-authored-by: Claude Code <claude@anthropic.com>
Co-authored-by: Benjamin Welton <bwelton@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
2025-09-18 18:10:45 -05:00