نمودار کامیت

2062 کامیت‌ها

مولف SHA1 پیام تاریخ
Sjoerd Simons 7e6b7cb50b Improve path readability check (#2967)
On modern system e.g. render nodes are made accessible via the udev
uaccess functionality, which adds the logged in user to the ACL of the
device. This means just checking for user and group is bound to give
false positives. Instead use os.access as a first check

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-30 11:34:39 -06:00
systems-assistant[bot] 327778ef18 Add nccl_debug variable and values (#2756)
* nccl-debug variables table test

* spacing

* spacing

* RCCL variable edits from SME

* Update projects/rccl/docs/api-reference/env-variables.rst

Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>

---------

Co-authored-by: Matt Williams <Matt.Williams+amdeng@amd.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
Co-authored-by: Matt Williams <matt.williams@amd.com>
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>
2026-01-30 12:00:18 -05:00
David Yat Sin 00e8a67165 rocr: Restore mmap flags back to MAP_PRIVATE (#2886)
Change mmap flags back to MAP_PRIVATE as MAP_SHARED increases allocation
time. Transparent huge pages are disabled for MAP_SHARED by default.
2026-01-30 08:36:05 -08:00
systems-assistant[bot] 1211790607 Direct Reduce Scatter Implementation (#2765)
* Add new implementation of direct send/recv reduce scatter

* Resolved conflicts

* Add multiple channels support to the reduction kernel of direct reduce scatter and adjust offset into buffer to utilize multiple channels.

* Resolve validation issue when number of elements is not divisible by number of channels leaving elements unaccount for in reduction.

* fix proxy hang

* set maxSrcs to 64 in reduceCopy

* optimize multi-channel code

* fix validation issue in single node MI300

* Tune the message size range for 2,4, and 8 Nodes

* Move Direct RS into separate kernel

* Add Copyright

* resolve review comments

* resolve review comments

* fix merge build issue

* revert move Direct RS into separate kernel

* address review comments

* address review comments

---------

Co-authored-by: KawtharShafie <kawtharshafie@gmail.com>
Co-authored-by: Ghadeer Alabandi <abandiga@gmail.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
2026-01-30 09:27:27 -06:00
systems-assistant[bot] 055909d335 Set default max channels to 48 for MI350 multi-node (#2759)
* make 48 the default max channels for MI350

* address review comments

---------

Co-authored-by: Ghadeer Alabandi <abandiga@gmail.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>
2026-01-30 09:22:42 -06:00
Alysa Liu 13091e18ad libhsakmt: Add THEROCK_SANITIZER support for ASAN builds (#2978)
Add THEROCK_SANITIZER support for ASAN builds.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2026-01-30 10:02:10 -05:00
Jin Jung 25d0107d24 SWDEV-575867: Fix error code for mapped graphics resources (#2662) 2026-01-30 07:47:13 -05:00
Alexandra Sidorova 8800e03058 [CLR] Added missed ostream include to amd_hip_bfloat16.h (#2960) 2026-01-30 07:42:38 -05:00
Jan Stephan 20a745962f Fix graph API binary paths (#2884)
Signed-off-by: Jan Stephan <jan.stephan@amd.com>
2026-01-30 10:51:37 +01:00
Junhua Shen 0d98c3bdd5 libhsakmt: Implement per-context topology for multi-context KFD support (#2405)
This enhances libhsakmt's capabilities for multi-context KFD support by implementing per-context topology management.

Changes:
* Add hsaKmtGetClockCountersCtx for multi-context support
  - Add context-aware version of hsaKmtGetClockCounters
  - Original API is retained as a wrapper calling the ctx-version with primary context

* Enable independent debug sessions across multiple KFD contexts
  -Create hsa_kfd_debug_context, introduce context-aware debug APIs, shift debug state to per-context

* Add perf sub-context for per-context performance counter management
  - Introduce hsa_kfd_perf_context, move counter properties, add context - aware perf APIs, and update initialization

* Refactor FMM for per-context resource management
  - Refactor multiple global variables related to FMM, including 
    GPU ID arrays , svm, cpuvm_aperture, and mem_handle_aperture to hsa_kfd_fmm_context

* Implement per-context topology for complete context isolation
  - Migrate global topology data (g_system, g_props, map_user_to_sysfs_node_id)
     to per-context hsa_kfd_topology_context structure
  - Update all topology functions to accept HsaKFDContext parameter for
     context-aware operations (validate_nodeid, get_node_props, get_iolink_props, etc.)
  - Refactor topology snapshot management for per-context isolation
  - Add context-aware PMC trace access APIs

Signed-off-by: Junhua Shen <Junhua.Shen@amd.com>
2026-01-30 09:42:25 +08:00
vedithal-amd a838b0c07b [rocprofiler-compute] Fix test case for MI 308 (#2934)
* Fix test case for MI 308

* Use consistent naming of GPUs in comment
2026-01-29 18:54:52 -05:00
pghoshamd adaaa883b2 SWDEV-576090 Fix mem leaks and double free of signals (#2817) 2026-01-29 16:53:27 -05:00
Mark Meserve 94c246eb9e attach: fix typos and older names in documentation (#2684) 2026-01-29 16:46:24 -05:00
vedithal-amd 4b364df43b [rocprofiler-compute] Enable panel level csv files for roofline panel (#2887)
* Enable panel level csv files for roofline panel

* Fixed comments
2026-01-29 15:32:54 -05:00
Rahul Manocha c4f7593001 clr: Update signal count and pool size for staging buffer (#2889)
* clr: Update signal count and pool size for staging buffer

* Change to naming of variables etc

---------

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2026-01-29 10:34:00 -08:00
systems-assistant[bot] 58c203e252 Fix channel overuse for 1 rank comms (#2760)
* Fix channel overuse for 1 rank comms

* limit channels when warpSpeed is enabled but not used

* enable std::min check against # of CUs for maxChannels computation when warpSpeed is enabled

---------

Co-authored-by: Mustafa Abduljabbar <muabdulj@amd.com>
Co-authored-by: isaki001 <ioannissakiotis@gmail.com>
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>
2026-01-29 12:13:46 -06:00
Benjamin Welton b509e9bd77 [rocprofiler-sdk] Fix domain_ops_padding for 515+ HIP operations (#2941)
* [rocprofiler-sdk] Fix domain_ops_padding for 515+ HIP operations

The HIP runtime API now has 515+ operations (as of ROCm 7.x), but
domain_ops_padding was set to 512. This caused std::out_of_range
exceptions when checking operations >= 512 via std::bitset::test().

Changes:
- Increase domain_ops_padding from 512 to 1024
- Add compile-time static_assert to validate padding is sufficient
  for all API domains (HIP, HSA, marker, RCCL, rocDecode, rocJPEG)

Co-Authored-By: Claude (claude-opus-4.5) <noreply@anthropic.com>

* Update projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/context/domain.cpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* [rocprofiler-sdk] Apply clang-format-11 to domain.cpp

Co-Authored-By: Claude (claude-opus-4.5) <noreply@anthropic.com>

* Rework implementation to ensure coverage of all operation enums

* Fix compiler error in unit test for enum_string.cpp

* Fix data types of domain_ops_padding values

* Revert some changes in domain.cpp

---------

Co-authored-by: Claude (claude-opus-4.5) <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2026-01-29 12:26:33 -05:00
Yiannis Papadopoulos 66ee941fea rocr/aie: Throw exception for malformed packets and packet submission errors (#2528) 2026-01-29 10:52:42 -05:00
pghoshamd bc20b51f40 SWDEV-561708 Counted queue size from env var (#2844)
* SWDEV-561708 Counted queue size from env var

* use counted_queue_size for test

* remove rocrtst changes; add a const for default queue size

* Remove env var from test; use queue->size

* Improve env var documentation

* Correct type
2026-01-29 10:00:37 -05:00
Venkateshwar Reddy Kandula a7c3e8392a [rocprofiler-sdk] Use venv for fixing CI docker image workflow (#2955)
* use python virtual env for aws cli

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* use 7.2 amdgpu for ubuntu

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-29 09:53:15 -05:00
David Galiffi 4c458fae9c [rocprofiler-systems] Fix ROCM_VERSION guard used for the scratch_memory_record structure (#2948)
- Fix ROCM_VERSION guard used for the scratch_memory_record structure
- This fixes a rocm/7.0.2 build failure

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2026-01-29 09:34:27 -05:00
moniljethva b5e4074c78 Adding support for GFX 11.5 in AQL Profiler (#2340)
* Adding support of AQL Profiler for GFX 11.5

* Removing hard coded value for sa_number

* Adding instance count for WGP block, removing hard coded values.

* Fixed SQ counter block and TD counter block instances
2026-01-29 17:39:12 +05:30
Jaydeep 190d9a8e27 SWDEV-561273 - hip samples on TheRock build using HIP LANGUAGE and hip-lang package. (#1794) 2026-01-29 09:15:58 +01:00
Bindhiya Kanangot Balakrishnan fa6f071751 [SWDEV-574637] Avoid redundant hive gpu resets (#2657)
Mode-1 GPU reset affects entire XGMI hive. Added
xgmi_hive_id check to reset only once for same-hive
GPUs while preserving separate resets for different
hives or no hives.
 - Example:
   `sudo amd-smi reset -G` or `sudo amd-smi reset -G -g 0`
   on MI300 will reset all GPU's only once.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2026-01-28 22:59:17 -06:00
Sumanth Gavini e9c72b06b0 [ROCM-1036] Dynamic fan support detection in set -h (#2721)
Show "N/A" for ASICs without fan support
`amd-smi set -h` fan help text will be dynamic instead of "0-255 or 0-100%"

Signed-off-by: Sumanth Gavini <sumanth.gavini@amd.com>
2026-01-28 22:44:25 -06:00
koushikbillakanti-amd e9b143323a [SWDEV-498649] Fix reset cli AttributeError (#2203)
* Fix SWDEV-498649: Handle missing attributes safely in set_gpu

---------

Co-authored-by: gabrpham <Gabriel.Pham@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2026-01-28 22:39:50 -06:00
Yazen AL Musaffar 19725abbf4 [SWDEV-560702] Per process MEM usages does not add up to per GPU MEM usage. (#2888)
* Update pyhton docs for process memory usage
* Added comment for processes total memory usage

---------

Signed-off-by: yalmusaf <Yazen.ALMusaffar@amd.com>
2026-01-28 22:34:20 -06:00
Loganaden Velvindron bf36e5f620 Fix disabled fortify source security flag (#2570)
Fix spurious character that caused CI issue.
2026-01-28 22:30:24 -06:00
peterjunpark 159e751788 docs(amdsmi): add link to amd-smi-virt (#2543)
Update install page virt references
Signed-off-by: Peter Park <peter.park@amd.com>
2026-01-28 22:24:55 -06:00
Joseph Narlo 48a4cda75c [SWDEV-552552] Provide CLI testing within amd-smi-lib-tests install (#2485)
* Add common module
* Added information to help with unknowns
* Allow paring of cmds
* change cmd print default
* Reduce cmds to be tested

---------

Signed-off-by: amd-josnarlo <joseph.narlo@amd.com>
Co-authored-by: amd-josnarlo <joseph.narlo@amd.com>
2026-01-28 22:16:01 -06:00
Adam Pryor cf3e283d85 [FMDEV-170733] Remove amd-smi ptl set check (#2933)
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
2026-01-28 22:12:17 -06:00
systems-assistant[bot] 27be824745 [SWDEV-565483] Add power profile set/get to amd-smi CLI (#1905)
* Fix exception handling in power profile commands
* Update CHANGELOG.md
* Update amdsmi_parser.py for the single character argument for --profile as -o

---------

Co-authored-by: Koushik Billakanti <Koushik.Billakanti@amd.com>
Co-authored-by: gabrpham <Gabriel.Pham@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2026-01-28 22:00:18 -06:00
Gopesh Bhardwaj 680a92769c Fixing aqlprofile ASM statement (#2881)
* Fixing aqlprofile ASM statement

* Removing f16 tests
2026-01-29 09:01:41 +05:30
Tao Sang 66a1e38387 SWDEV-577011 Fix missing ais symbols in Windows (#2871)
Fix missing ais symbols in rocr in Windows
2026-01-28 22:29:30 -05:00
Yazen AL Musaffar 0c54f1d6f6 [AMD-SMI] [SWDEV-572092] amd-smi does not redirect output to file when --json option is used. (#2389)
* Fix for amd-smi json file redirection is broken

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>

* merge branch develop into SWDEV-572092

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>

---------

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>
2026-01-28 13:54:44 -06:00
German Andryeyev a5ada1e6e3 SWDEV-567852 - Clean-up HIP events (#2708)
* SWDEV-567852 - Clean-up HIP events

Removed unused fields, optimized memory allocation, improved encapsulation, modernized with C++11 auto, added documentation
2026-01-28 13:34:07 -05:00
Swati Rawat 9de4a2ebb1 Correct rocprofv3 usage instructions (#2925)
* Correct rocprofv3 usage

* Apply suggestion from @SwRaw

* Apply suggestion from @SwRaw

* Update .gitignore
2026-01-28 22:46:19 +05:30
Sajina PK e265e0e24f [rocprofiler-systems]: Add documentation for communication API tracing (#2478)
Add documentation for communication runtime tracing for MPI, UCX, RCCL.

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2026-01-27 23:48:27 -05:00
SaleelK 5c7c549301 clr: Fix some nullptr checks and prints (#2825) 2026-01-27 16:45:17 -08:00
vedithal-amd 996202f560 [rocprofiler-compute] Backport documentation changes from ROCm 7.1 release branch (#2894)
* Backport documentation changes from ROCm 7.1 release branch

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Address review comments

---------

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-27 17:22:41 -05:00
vedithal-amd 717cdde126 Update test_metric_validation.py to handle MI325X (#2866) 2026-01-27 16:12:05 -05:00
vedithal-amd 93407271df [rocprofiler-compute] Fix docker file for testing (#2883)
* Fix docker file for testing

* Add correct WORKDIR
2026-01-27 16:11:29 -05:00
cfallows-amd 4d7f709510 [rocprofiler-compute] Update baseline comparison notes in documentation (#2878)
* Update baseline comparison with anchor, text, samples, image in CLI page. Fixes broken 404 links after grafana was removed.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Update options in list to full name, correct gpu id option.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Formatting and broken intersphinx fixed

* Indentation formatting fixed

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: prbasyal <prbasyal@amd.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
2026-01-27 16:04:21 -05:00
Yiannis Papadopoulos fdb19e5a4c rocr: Format script skips non-existing files in sparse checkouts (#2360) 2026-01-27 15:58:53 -05:00
Shadi Dashmiz b816d10802 Fix for pntr attri query from a peer device (#2722)
* Fix for pntr attri query from a peer device

Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>

* SWDEV-577116 : Fix qeury on peer device

- if access is disabled query should return error.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-27 15:25:14 -05:00
sluzynsk-amd f37b100c34 SWDEV-563777 - further reduce compilation warnings (#2331)
This change resolves some of the warnings generated during clr builds.
Quiet regular output of doxygen.
Disable non-documented warnings of doxygen.

Signed-off-by: Sebastian Luzynski <Sebastian.Luzynski@amd.com>
2026-01-27 20:51:16 +01:00
Yazen AL Musaffar b7829db10a [AMD-SMI] [SWDEV-553392] Removed Driver Reload capability from amd-smi cli only. (#2665)
* Removed Driver Reload capability from amd-smi cli only

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>

* Updates

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>

* updates

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>

* Update CHANGELOG.md

---------

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2026-01-27 13:33:03 -06:00
Ioannis Assiouras a66c6ca156 Removed extra marker when syncing graph streams back to the launch stream (#2823) 2026-01-27 19:26:48 +00:00
Benjamin Welton 1517a398bf [rocprofiler-sdk] Buffer finalization fixes and HSA ABI 0x09 support (#2318)
* [rocprofiler-sdk] Fix buffer flush ordering and sanitizer CI improvements

Buffer Pool Design
------------------
Replace the fixed array-based double buffer with a dynamic pool design to
fix race conditions that caused "internal correlation id was retired
prematurely" errors.

The original design had a race where flush callbacks could be delivered
out-of-order: when buffer 0 fills and begins flushing, writes go to
buffer 1. If buffer 1 fills before buffer 0's flush completes, the
buffer index wraps back to 0 (which may still be flushing). Independent
flush tasks submitted to the thread pool can complete out of order.

The new pool design:
- Uses a std::deque of buffer instances that grows as needed
- Allocates buffers from the pool when the current buffer needs to flush
- Serializes flushes with a mutex to ensure FIFO callback ordering
- Returns buffers to the pool after flush completion
- Eliminates the race between buffer selection and write operations

New Unit Tests
--------------
- buffer_correlation_ordering.cpp: Tests that API records are always
  delivered before their corresponding retirement records
- buffer_ordering_stress.cpp: Stress tests buffer flush ordering under
  high contention with multiple threads rapidly filling buffers

HSA Tool Hooks
--------------
Added hsa_tool_hooks.cpp/hpp to register an HSA OnUnload callback that
waits for pending flush tasks before tool finalization, preventing
"retired prematurely" errors during HSA shutdown.

Sanitizer Improvements
----------------------
- LSAN: Set fast_unwind_on_malloc=1 to prevent deadlock in libgcc unwinder
- LSAN: Added suppressions for external tools (liblzma, liblsan, seq, strdup)
- TSAN: Added suppression for false positive on C++11 thread-safe static
  initialization in create_write_functor
- ASAN/UBSAN: Added patterns for known issues in HSA runtime, HIP, perfetto
- Disabled attachment tests for sanitizers due to library preloading issues

Other Fixes
-----------
- Thread-trace agent test: Use heap-allocated callback state
- Correlation ID: Refactored reference counting and finalization ordering

* [rocprofiler-sdk] Revert buffer pool design changes

Revert buffer.cpp and buffer.hpp to the original double-buffer
design from develop branch. The pool-based redesign introduced
concerns about:
- Signal safety (mutex vs atomic_flag)
- API changes (flush() return type)
- Complexity of the new design

This revert removes:
- Dynamic buffer pool with std::deque
- std::mutex/condition_variable synchronization
- buffer_correlation_ordering.cpp test
- buffer_ordering_stress.cpp test

The underlying buffer flush ordering issue will need to be
addressed with a different approach that preserves the original
API and synchronization characteristics.

* [rocprofiler-sdk] Consistent fini_status checks to prevent correlation ID creation during finalization

- Revert TOCTOU CAS loop change in sub_ref_count() - not needed with consistent checks
- Add fini_status check in correlation_tracing_service::construct() with ROCP_CI_LOG warning
- Add nullptr checks at all construct() call sites (queue.cpp, async_copy.cpp, memory_allocation.cpp)
- Change all 'get_fini_status() > 0' to '!= 0' for consistent behavior:
  - hsa/queue.cpp (lines 105, 210)
  - hsa/async_copy.cpp (line 344)
  - hsa/hsa_barrier.cpp (line 43)
  - buffer.cpp (lines 107, 138, 185)

This ensures no correlation IDs are created once finalization starts (fini_status != 0),
preventing races between finalization and ongoing tracing operations.

* [rocprofiler-sdk] Replace arrival-order checks with timestamp-based temporal validation

Buffer records are not guaranteed to arrive in any specific order. Tests and
samples should use timestamps for temporal ordering validation instead.

Changes:
- samples/external_correlation_id_request: Replace 'retired prematurely' arrival
  order check with timestamp-based validation that retirement timestamp >=
  max(end_timestamps) for records with the same correlation ID
- tests/external_correlation.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
- tests/registration.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
- tests/roctx.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check

Correlation IDs are not guaranteed to be monotonically increasing when records
are sorted by timestamp. Temporal ordering should be validated using the
timestamp fields in each record.

* [rocprofiler-sdk] Revert external/CMakeLists.txt SYSTEM keyword removal

Restore the SYSTEM keyword to target_include_directories for
rocprofiler-sdk-fmt to match develop branch.

* [rccl] Remove orphaned rocSHMEM gitlink

Remove orphaned submodule reference that was introduced during a merge
but never had a corresponding .gitmodules entry, causing CI failures
with "fatal: no submodule mapping found in .gitmodules".

* [rocprofiler-sdk] Add HSA ABI version 0x09 support

Add ABI checks for HSA_AMD_EXT_API_TABLE_STEP_VERSION 0x09 which
introduces hsa_amd_counted_queue_acquire and hsa_amd_counted_queue_release
functions (added in rocr-runtime SWDEV-561708).

* [rocprofiler-sdk] Handle finalized status gracefully in buffer flush operations

This commit consolidates fixes for handling the finalization status during
buffer flush operations across the SDK.

Changes:
- Tool and samples: Handle ROCPROFILER_STATUS_ERROR_FINALIZED gracefully
  when flushing buffers, as this indicates buffers were already flushed
  during finalization (not an error condition)
- HSA handlers (queue.cpp, async_copy.cpp, hsa_barrier.cpp): Use > 0 check
  for fini_status to allow operations during finalization process
- buffer.cpp: Revert fini_status checks to use > 0 for consistency
- correlation_id.cpp: Add fini_status > 0 check with ROCP_TRACE logging
  to prevent correlation ID creation after finalization starts

Files modified:
- source/lib/rocprofiler-sdk-tool/tool.cpp
- tests/tools/json-tool.cpp
- source/lib/rocprofiler-sdk/tests/registration.cpp
- source/lib/rocprofiler-sdk/tests/roctx.cpp
- samples/api_buffered_tracing/client.cpp
- samples/counter_collection/buffered_client.cpp
- samples/counter_collection/device_counting_async_client.cpp
- samples/external_correlation_id_request/client.cpp
- samples/pc_sampling/client.cpp
- source/lib/rocprofiler-sdk/buffer.cpp
- source/lib/rocprofiler-sdk/context/correlation_id.cpp
- source/lib/rocprofiler-sdk/hsa/queue.cpp
- source/lib/rocprofiler-sdk/hsa/async_copy.cpp
- source/lib/rocprofiler-sdk/hsa/hsa_barrier.cpp

* [rocprofiler-sdk] Remove hsa_tool_hooks and simplify buffer flush handling

Remove the hsa_tool_hooks infrastructure and simplify buffer flush calls
in samples and tools. The ERROR_FINALIZED handling was overly complex
and the hsa_tool_hooks OnUnload synchronization is no longer needed.

Changes:
- Remove hsa_tool_hooks.cpp/hpp and related registration.cpp code
- Simplify buffer flush calls in samples to use direct ROCPROFILER_CALL
- Simplify buffer flush in tool.cpp and json-tool.cpp
- Remove ERROR_FINALIZED special handling from test files

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Fix output_stream move semantics to null source pointers

The default move constructor and move assignment operator for
output_stream did not null out the source's pointers after the move.
This caused double-close when the moved-from temporary was destroyed,
leading to use-after-free crashes (SIGSEGV in std::ostream::sentry).

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Improve Perfetto trace writer and sanitizer configuration

- generatePerfetto.cpp: Move output_stream into shared_state to prevent
  use-after-free race conditions during Perfetto callback execution
- run-ci.py: Simplify and consolidate sanitizer environment variable
  configuration for better maintainability

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Revert run-ci.py changes that broke sanitizer suppressions

The previous changes removed MEMCHECK_SANITIZER_OPTIONS which is required
for CTest to properly pass suppression files to the sanitizers during
memcheck runs.

Co-Authored-By: Claude <noreply@anthropic.com>

* Revert "[rccl] Remove orphaned rocSHMEM gitlink"

This reverts commit 1ad21003941355658fff8114fa27768f11a948f7.

* [rocprofiler-sdk] Revert registration.cpp changes

Revert changes to registration.cpp to match develop branch.

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Remove suppression file content printing from run-ci.py

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix output_stream move ctor/assignment operator

* Fix erroneous revert of registration.cpp

* Fix handling of fini status in correlation ID construction

* [rocprofiler-sdk] Fix OMPT segfault during finalization

Add nullptr checks in OMPT tracing code to handle the case where
correlation_tracing_service::construct() returns nullptr during
finalization. This fixes segfaults in openmp-target-sample and
tests.integration.execute.openmp-tools.

The correlation ID construction now returns nullptr when fini_status > 0,
but the OMPT callbacks were not checking for this, causing crashes when
dereferencing the null pointer during OpenMP runtime shutdown.

Changes:
- event_common(): Return nullptr early if correlation ID is null
- event(): Check for nullptr before calling sub_ref_count()
- ompt_task_create_callback(): Return early if correlation ID is null
- ompt_task_schedule_callback(): Return early if correlation ID is null

* [rocprofiler-sdk] Fix HSA API tracing segfault during finalization

Add nullptr check in hsa_api_impl::functor after correlation ID
construction. During finalization, correlation_service::construct()
returns nullptr, and without this check the code would dereference
the null pointer when accessing corr_id->internal.

This fixes the SEGV at address 0x000000000008 (null + 8 byte offset)
that occurs when HSA async event threads call hsa_signal_destroy
during runtime shutdown after finalization has started.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2026-01-27 13:27:54 -05:00
vstojilj 9a8942a89c SWDEV-558836, SWDEV-558837 - Add hipMemSetMemPool and hipMemGetMemPoo… (#1349)
* SWDEV-558836, SWDEV-558837 - Add hipMemSetMemPool and hipMemGetMemPool implementation

* Add managed allocation type for mem pools

* Update rocprofiler-sdk with APis declaration
2026-01-27 18:45:28 +01:00