Grafik Komit

74781 Melakukan

Penulis SHA1 Pesan Tanggal
pghoshamd bc20b51f40 SWDEV-561708 Counted queue size from env var (#2844)
* SWDEV-561708 Counted queue size from env var

* use counted_queue_size for test

* remove rocrtst changes; add a const for default queue size

* Remove env var from test; use queue->size

* Improve env var documentation

* Correct type
2026-01-29 10:00:37 -05:00
Venkateshwar Reddy Kandula a7c3e8392a [rocprofiler-sdk] Use venv for fixing CI docker image workflow (#2955)
* use python virtual env for aws cli

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* use 7.2 amdgpu for ubuntu

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-29 09:53:15 -05:00
David Galiffi 4c458fae9c [rocprofiler-systems] Fix ROCM_VERSION guard used for the scratch_memory_record structure (#2948)
- Fix ROCM_VERSION guard used for the scratch_memory_record structure
- This fixes a rocm/7.0.2 build failure

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2026-01-29 09:34:27 -05:00
moniljethva b5e4074c78 Adding support for GFX 11.5 in AQL Profiler (#2340)
* Adding support of AQL Profiler for GFX 11.5

* Removing hard coded value for sa_number

* Adding instance count for WGP block, removing hard coded values.

* Fixed SQ counter block and TD counter block instances
2026-01-29 17:39:12 +05:30
Jaydeep 190d9a8e27 SWDEV-561273 - hip samples on TheRock build using HIP LANGUAGE and hip-lang package. (#1794) 2026-01-29 09:15:58 +01:00
Bindhiya Kanangot Balakrishnan fa6f071751 [SWDEV-574637] Avoid redundant hive gpu resets (#2657)
Mode-1 GPU reset affects entire XGMI hive. Added
xgmi_hive_id check to reset only once for same-hive
GPUs while preserving separate resets for different
hives or no hives.
 - Example:
   `sudo amd-smi reset -G` or `sudo amd-smi reset -G -g 0`
   on MI300 will reset all GPU's only once.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2026-01-28 22:59:17 -06:00
Sumanth Gavini e9c72b06b0 [ROCM-1036] Dynamic fan support detection in set -h (#2721)
Show "N/A" for ASICs without fan support
`amd-smi set -h` fan help text will be dynamic instead of "0-255 or 0-100%"

Signed-off-by: Sumanth Gavini <sumanth.gavini@amd.com>
2026-01-28 22:44:25 -06:00
koushikbillakanti-amd e9b143323a [SWDEV-498649] Fix reset cli AttributeError (#2203)
* Fix SWDEV-498649: Handle missing attributes safely in set_gpu

---------

Co-authored-by: gabrpham <Gabriel.Pham@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2026-01-28 22:39:50 -06:00
Yazen AL Musaffar 19725abbf4 [SWDEV-560702] Per process MEM usages does not add up to per GPU MEM usage. (#2888)
* Update pyhton docs for process memory usage
* Added comment for processes total memory usage

---------

Signed-off-by: yalmusaf <Yazen.ALMusaffar@amd.com>
2026-01-28 22:34:20 -06:00
Loganaden Velvindron bf36e5f620 Fix disabled fortify source security flag (#2570)
Fix spurious character that caused CI issue.
2026-01-28 22:30:24 -06:00
peterjunpark 159e751788 docs(amdsmi): add link to amd-smi-virt (#2543)
Update install page virt references
Signed-off-by: Peter Park <peter.park@amd.com>
2026-01-28 22:24:55 -06:00
Joseph Narlo 48a4cda75c [SWDEV-552552] Provide CLI testing within amd-smi-lib-tests install (#2485)
* Add common module
* Added information to help with unknowns
* Allow paring of cmds
* change cmd print default
* Reduce cmds to be tested

---------

Signed-off-by: amd-josnarlo <joseph.narlo@amd.com>
Co-authored-by: amd-josnarlo <joseph.narlo@amd.com>
2026-01-28 22:16:01 -06:00
Adam Pryor cf3e283d85 [FMDEV-170733] Remove amd-smi ptl set check (#2933)
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
2026-01-28 22:12:17 -06:00
systems-assistant[bot] 27be824745 [SWDEV-565483] Add power profile set/get to amd-smi CLI (#1905)
* Fix exception handling in power profile commands
* Update CHANGELOG.md
* Update amdsmi_parser.py for the single character argument for --profile as -o

---------

Co-authored-by: Koushik Billakanti <Koushik.Billakanti@amd.com>
Co-authored-by: gabrpham <Gabriel.Pham@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2026-01-28 22:00:18 -06:00
Gopesh Bhardwaj 680a92769c Fixing aqlprofile ASM statement (#2881)
* Fixing aqlprofile ASM statement

* Removing f16 tests
2026-01-29 09:01:41 +05:30
Tao Sang 66a1e38387 SWDEV-577011 Fix missing ais symbols in Windows (#2871)
Fix missing ais symbols in rocr in Windows
2026-01-28 22:29:30 -05:00
Copilot 14f9f2537a Add artifact upload steps to AMDSMI CI workflow for PR builds (#2936) 2026-01-28 18:14:47 -05:00
David Yat Sin 99d88827fb Update CODEOWNERS for ROCR-Runtime (#2790) 2026-01-28 15:53:23 -05:00
Yazen AL Musaffar 0c54f1d6f6 [AMD-SMI] [SWDEV-572092] amd-smi does not redirect output to file when --json option is used. (#2389)
* Fix for amd-smi json file redirection is broken

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>

* merge branch develop into SWDEV-572092

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>

---------

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>
2026-01-28 13:54:44 -06:00
German Andryeyev a5ada1e6e3 SWDEV-567852 - Clean-up HIP events (#2708)
* SWDEV-567852 - Clean-up HIP events

Removed unused fields, optimized memory allocation, improved encapsulation, modernized with C++11 auto, added documentation
2026-01-28 13:34:07 -05:00
Swati Rawat 9de4a2ebb1 Correct rocprofv3 usage instructions (#2925)
* Correct rocprofv3 usage

* Apply suggestion from @SwRaw

* Apply suggestion from @SwRaw

* Update .gitignore
2026-01-28 22:46:19 +05:30
Jason Bonnell d917259953 Add --verbose to ctest to get more output (#2928) 2026-01-28 22:43:14 +05:30
Sajina PK e265e0e24f [rocprofiler-systems]: Add documentation for communication API tracing (#2478)
Add documentation for communication runtime tracing for MPI, UCX, RCCL.

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2026-01-27 23:48:27 -05:00
SaleelK 5c7c549301 clr: Fix some nullptr checks and prints (#2825) 2026-01-27 16:45:17 -08:00
vedithal-amd 996202f560 [rocprofiler-compute] Backport documentation changes from ROCm 7.1 release branch (#2894)
* Backport documentation changes from ROCm 7.1 release branch

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Address review comments

---------

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-27 17:22:41 -05:00
vedithal-amd 717cdde126 Update test_metric_validation.py to handle MI325X (#2866) 2026-01-27 16:12:05 -05:00
vedithal-amd 93407271df [rocprofiler-compute] Fix docker file for testing (#2883)
* Fix docker file for testing

* Add correct WORKDIR
2026-01-27 16:11:29 -05:00
cfallows-amd 4d7f709510 [rocprofiler-compute] Update baseline comparison notes in documentation (#2878)
* Update baseline comparison with anchor, text, samples, image in CLI page. Fixes broken 404 links after grafana was removed.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Update options in list to full name, correct gpu id option.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Formatting and broken intersphinx fixed

* Indentation formatting fixed

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: prbasyal <prbasyal@amd.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
2026-01-27 16:04:21 -05:00
Yiannis Papadopoulos fdb19e5a4c rocr: Format script skips non-existing files in sparse checkouts (#2360) 2026-01-27 15:58:53 -05:00
Shadi Dashmiz b816d10802 Fix for pntr attri query from a peer device (#2722)
* Fix for pntr attri query from a peer device

Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>

* SWDEV-577116 : Fix qeury on peer device

- if access is disabled query should return error.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-27 15:25:14 -05:00
sluzynsk-amd f37b100c34 SWDEV-563777 - further reduce compilation warnings (#2331)
This change resolves some of the warnings generated during clr builds.
Quiet regular output of doxygen.
Disable non-documented warnings of doxygen.

Signed-off-by: Sebastian Luzynski <Sebastian.Luzynski@amd.com>
2026-01-27 20:51:16 +01:00
Yazen AL Musaffar b7829db10a [AMD-SMI] [SWDEV-553392] Removed Driver Reload capability from amd-smi cli only. (#2665)
* Removed Driver Reload capability from amd-smi cli only

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>

* Updates

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>

* updates

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>

* Update CHANGELOG.md

---------

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2026-01-27 13:33:03 -06:00
Ioannis Assiouras a66c6ca156 Removed extra marker when syncing graph streams back to the launch stream (#2823) 2026-01-27 19:26:48 +00:00
Venkateshwar Reddy Kandula 7f5e443e44 format rocprofiler-sdk via black. (#2703)
Co-authored-by: Venkateshwar Reddy Kandula <venkateshwar.kandula1306@gmail.com>
2026-01-27 13:30:50 -05:00
Benjamin Welton 1517a398bf [rocprofiler-sdk] Buffer finalization fixes and HSA ABI 0x09 support (#2318)
* [rocprofiler-sdk] Fix buffer flush ordering and sanitizer CI improvements

Buffer Pool Design
------------------
Replace the fixed array-based double buffer with a dynamic pool design to
fix race conditions that caused "internal correlation id was retired
prematurely" errors.

The original design had a race where flush callbacks could be delivered
out-of-order: when buffer 0 fills and begins flushing, writes go to
buffer 1. If buffer 1 fills before buffer 0's flush completes, the
buffer index wraps back to 0 (which may still be flushing). Independent
flush tasks submitted to the thread pool can complete out of order.

The new pool design:
- Uses a std::deque of buffer instances that grows as needed
- Allocates buffers from the pool when the current buffer needs to flush
- Serializes flushes with a mutex to ensure FIFO callback ordering
- Returns buffers to the pool after flush completion
- Eliminates the race between buffer selection and write operations

New Unit Tests
--------------
- buffer_correlation_ordering.cpp: Tests that API records are always
  delivered before their corresponding retirement records
- buffer_ordering_stress.cpp: Stress tests buffer flush ordering under
  high contention with multiple threads rapidly filling buffers

HSA Tool Hooks
--------------
Added hsa_tool_hooks.cpp/hpp to register an HSA OnUnload callback that
waits for pending flush tasks before tool finalization, preventing
"retired prematurely" errors during HSA shutdown.

Sanitizer Improvements
----------------------
- LSAN: Set fast_unwind_on_malloc=1 to prevent deadlock in libgcc unwinder
- LSAN: Added suppressions for external tools (liblzma, liblsan, seq, strdup)
- TSAN: Added suppression for false positive on C++11 thread-safe static
  initialization in create_write_functor
- ASAN/UBSAN: Added patterns for known issues in HSA runtime, HIP, perfetto
- Disabled attachment tests for sanitizers due to library preloading issues

Other Fixes
-----------
- Thread-trace agent test: Use heap-allocated callback state
- Correlation ID: Refactored reference counting and finalization ordering

* [rocprofiler-sdk] Revert buffer pool design changes

Revert buffer.cpp and buffer.hpp to the original double-buffer
design from develop branch. The pool-based redesign introduced
concerns about:
- Signal safety (mutex vs atomic_flag)
- API changes (flush() return type)
- Complexity of the new design

This revert removes:
- Dynamic buffer pool with std::deque
- std::mutex/condition_variable synchronization
- buffer_correlation_ordering.cpp test
- buffer_ordering_stress.cpp test

The underlying buffer flush ordering issue will need to be
addressed with a different approach that preserves the original
API and synchronization characteristics.

* [rocprofiler-sdk] Consistent fini_status checks to prevent correlation ID creation during finalization

- Revert TOCTOU CAS loop change in sub_ref_count() - not needed with consistent checks
- Add fini_status check in correlation_tracing_service::construct() with ROCP_CI_LOG warning
- Add nullptr checks at all construct() call sites (queue.cpp, async_copy.cpp, memory_allocation.cpp)
- Change all 'get_fini_status() > 0' to '!= 0' for consistent behavior:
  - hsa/queue.cpp (lines 105, 210)
  - hsa/async_copy.cpp (line 344)
  - hsa/hsa_barrier.cpp (line 43)
  - buffer.cpp (lines 107, 138, 185)

This ensures no correlation IDs are created once finalization starts (fini_status != 0),
preventing races between finalization and ongoing tracing operations.

* [rocprofiler-sdk] Replace arrival-order checks with timestamp-based temporal validation

Buffer records are not guaranteed to arrive in any specific order. Tests and
samples should use timestamps for temporal ordering validation instead.

Changes:
- samples/external_correlation_id_request: Replace 'retired prematurely' arrival
  order check with timestamp-based validation that retirement timestamp >=
  max(end_timestamps) for records with the same correlation ID
- tests/external_correlation.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
- tests/registration.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
- tests/roctx.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check

Correlation IDs are not guaranteed to be monotonically increasing when records
are sorted by timestamp. Temporal ordering should be validated using the
timestamp fields in each record.

* [rocprofiler-sdk] Revert external/CMakeLists.txt SYSTEM keyword removal

Restore the SYSTEM keyword to target_include_directories for
rocprofiler-sdk-fmt to match develop branch.

* [rccl] Remove orphaned rocSHMEM gitlink

Remove orphaned submodule reference that was introduced during a merge
but never had a corresponding .gitmodules entry, causing CI failures
with "fatal: no submodule mapping found in .gitmodules".

* [rocprofiler-sdk] Add HSA ABI version 0x09 support

Add ABI checks for HSA_AMD_EXT_API_TABLE_STEP_VERSION 0x09 which
introduces hsa_amd_counted_queue_acquire and hsa_amd_counted_queue_release
functions (added in rocr-runtime SWDEV-561708).

* [rocprofiler-sdk] Handle finalized status gracefully in buffer flush operations

This commit consolidates fixes for handling the finalization status during
buffer flush operations across the SDK.

Changes:
- Tool and samples: Handle ROCPROFILER_STATUS_ERROR_FINALIZED gracefully
  when flushing buffers, as this indicates buffers were already flushed
  during finalization (not an error condition)
- HSA handlers (queue.cpp, async_copy.cpp, hsa_barrier.cpp): Use > 0 check
  for fini_status to allow operations during finalization process
- buffer.cpp: Revert fini_status checks to use > 0 for consistency
- correlation_id.cpp: Add fini_status > 0 check with ROCP_TRACE logging
  to prevent correlation ID creation after finalization starts

Files modified:
- source/lib/rocprofiler-sdk-tool/tool.cpp
- tests/tools/json-tool.cpp
- source/lib/rocprofiler-sdk/tests/registration.cpp
- source/lib/rocprofiler-sdk/tests/roctx.cpp
- samples/api_buffered_tracing/client.cpp
- samples/counter_collection/buffered_client.cpp
- samples/counter_collection/device_counting_async_client.cpp
- samples/external_correlation_id_request/client.cpp
- samples/pc_sampling/client.cpp
- source/lib/rocprofiler-sdk/buffer.cpp
- source/lib/rocprofiler-sdk/context/correlation_id.cpp
- source/lib/rocprofiler-sdk/hsa/queue.cpp
- source/lib/rocprofiler-sdk/hsa/async_copy.cpp
- source/lib/rocprofiler-sdk/hsa/hsa_barrier.cpp

* [rocprofiler-sdk] Remove hsa_tool_hooks and simplify buffer flush handling

Remove the hsa_tool_hooks infrastructure and simplify buffer flush calls
in samples and tools. The ERROR_FINALIZED handling was overly complex
and the hsa_tool_hooks OnUnload synchronization is no longer needed.

Changes:
- Remove hsa_tool_hooks.cpp/hpp and related registration.cpp code
- Simplify buffer flush calls in samples to use direct ROCPROFILER_CALL
- Simplify buffer flush in tool.cpp and json-tool.cpp
- Remove ERROR_FINALIZED special handling from test files

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Fix output_stream move semantics to null source pointers

The default move constructor and move assignment operator for
output_stream did not null out the source's pointers after the move.
This caused double-close when the moved-from temporary was destroyed,
leading to use-after-free crashes (SIGSEGV in std::ostream::sentry).

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Improve Perfetto trace writer and sanitizer configuration

- generatePerfetto.cpp: Move output_stream into shared_state to prevent
  use-after-free race conditions during Perfetto callback execution
- run-ci.py: Simplify and consolidate sanitizer environment variable
  configuration for better maintainability

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Revert run-ci.py changes that broke sanitizer suppressions

The previous changes removed MEMCHECK_SANITIZER_OPTIONS which is required
for CTest to properly pass suppression files to the sanitizers during
memcheck runs.

Co-Authored-By: Claude <noreply@anthropic.com>

* Revert "[rccl] Remove orphaned rocSHMEM gitlink"

This reverts commit 1ad21003941355658fff8114fa27768f11a948f7.

* [rocprofiler-sdk] Revert registration.cpp changes

Revert changes to registration.cpp to match develop branch.

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Remove suppression file content printing from run-ci.py

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix output_stream move ctor/assignment operator

* Fix erroneous revert of registration.cpp

* Fix handling of fini status in correlation ID construction

* [rocprofiler-sdk] Fix OMPT segfault during finalization

Add nullptr checks in OMPT tracing code to handle the case where
correlation_tracing_service::construct() returns nullptr during
finalization. This fixes segfaults in openmp-target-sample and
tests.integration.execute.openmp-tools.

The correlation ID construction now returns nullptr when fini_status > 0,
but the OMPT callbacks were not checking for this, causing crashes when
dereferencing the null pointer during OpenMP runtime shutdown.

Changes:
- event_common(): Return nullptr early if correlation ID is null
- event(): Check for nullptr before calling sub_ref_count()
- ompt_task_create_callback(): Return early if correlation ID is null
- ompt_task_schedule_callback(): Return early if correlation ID is null

* [rocprofiler-sdk] Fix HSA API tracing segfault during finalization

Add nullptr check in hsa_api_impl::functor after correlation ID
construction. During finalization, correlation_service::construct()
returns nullptr, and without this check the code would dereference
the null pointer when accessing corr_id->internal.

This fixes the SEGV at address 0x000000000008 (null + 8 byte offset)
that occurs when HSA async event threads call hsa_signal_destroy
during runtime shutdown after finalization has started.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2026-01-27 13:27:54 -05:00
vstojilj 9a8942a89c SWDEV-558836, SWDEV-558837 - Add hipMemSetMemPool and hipMemGetMemPoo… (#1349)
* SWDEV-558836, SWDEV-558837 - Add hipMemSetMemPool and hipMemGetMemPool implementation

* Add managed allocation type for mem pools

* Update rocprofiler-sdk with APis declaration
2026-01-27 18:45:28 +01:00
Rahul Manocha 324a864bc4 SWDEV-558848 - Move DRM calls to thunk for better abstraction (#1912)
* SWDEV-558848 - Move DRM calls to thunk for better abstraction

* Use thunk device handle instead of drm inside agent

* Update IPC functions with new thunk calls

* create hsaKmtHandleImport interface to support ipc

* Reset metadata inside hsaKmtMemHandleFree

* remove whitespaces and NULL usage

* Add thunk apis to libhsakmt.ver

* Add comments to new structs in thunk

* Minor fixes to declarations

* resolve merge conflicts in amd_kfd_driver

---------

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2026-01-27 08:56:57 -08:00
systems-assistant[bot] 3a479a25ad 8 bytes mem leak fix (#2764)
* 8 bytes mem leak fix

* Adding a missing free()

* Clean up commented lines

* Add stdup fail check, memory ownership info

* Add stdup fail check, memory ownership info

---------

Co-authored-by: PJAvinash <avinashindian2.0@gmail.com>
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>
Co-authored-by: Avinash <44542533+PJAvinash@users.noreply.github.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
2026-01-27 08:29:16 -07:00
Joseph Narlo baf676f003 [SWDEV-572968] Readonly test failures on gfx1151 (#2697)
Signed-off-by: amd-josnarlo <josnarlo.amd.com>
Co-authored-by: amd-josnarlo <josnarlo.amd.com>
2026-01-27 08:29:19 -06:00
ggottipa-amd 77f7541755 [rocprofiler-compute] Adding --torch-trace option for SWDEV-559789 (#2089)
* Adding --torch-operator option in rocprof-compute. Creates csv file for
each operator that has gpu activity, showing operator to counter values
mapping.

* --torch-operators flag added to rocprofiler-sdk

* Adding ctest for --torch-operators.

* Adding pytest markers.

* Corrections in ctest and message logging.

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Adding a check for pytorch installation only when --torch-operators is passed.

* moving inject_roctx.py into src/utils.

* rebase

* Updating docs and changelog.

* Update projects/rocprofiler-compute/src/argparser.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/src/utils/inject_roctx.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Removing special characters.

* Minor corrections.

* Setting default value for torch_operators_enabled.

* Updating the number of files according to the number of passes.

* Adding rocpd support.

* Adding a warning message to be shown when profiling a non-python workload.

* copilot suggestions, rocpd+native tool fix

* Fixed the incorrect usage of dispatch_id as event_id in the function update_rocpd_pmc_events()

* ruff format fix

* ruff formating

* Deleting torch_trace.csvs after consolidating the operator data.

* Removing checks since *torch_trace.csv files are deleted.

* Fixing file deletion.

* Update projects/rocprofiler-compute/src/utils/inject_roctx.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/src/rocprof_compute_profile/profiler_base.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/src/utils/utils.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/tests/test_profile_general.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Using default options in the testcase.

* Adding test for overhead measurement.

* Corrections in docs.

* doc updates.

* Update projects/rocprofiler-compute/src/utils/inject_roctx.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Handling potential empty frames.

* Corrected the test cases.

* Changing the flag to --torch-trace

* Fixed helper_app path issues

* Path issues

* process_torch_trace_output() now takes csv file paths as input + allows default usage.

* Replaced pandas with sqlite3

* Adding marker_trace extraction to rocpd_data.py

* Allowing all workloads to use --torch-trace option. Assuming the workload is user verified.

* Modified help section for the flag.

* Added difference in runtimes for longest running kernels in each profiling runs to overhead measurements.

* Update projects/rocprofiler-compute/src/rocprof_compute_profile/profiler_base.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/src/rocprof_compute_profile/profiler_base.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Removed the accesses to the tables.

* Ruff fixes.

* ruff

* Ruff Fixes

* Adding getattr for args.torch_trace to handle mock args.

* Fix for 'Missing guid in counter collection data - in csv mode'

* Sending output_format to process_torch_trace_output

* Warning for self contained binaries.

* Ruff

* Ruff

* Measuring longest_running_kernel_baseline instead of worst_kernel_increase, very small kernel runtimes are blowing up the worst_kernel_increase metric.

* Minor fixes in input arguments

* Ruff

* Loging PyTorch version

* Fix ruff formatting for PyTorch version logging

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-27 19:50:25 +05:30
amilanov-amd cac67a0f32 SWDEV-521760 - Fix and enable disabled HIP tests from cooperative groups group (#2027)
* Reworked Unit_hipLaunchCooperativeKernel_Basic and Unit_hipLaunchCooperativeKernelMultiDevice_Basic
* Introduce reduction_factor for coop groups tests. Fix Unit_Coalesced_Group_Tiled_Partition_Sync_Positive_Basic
* Fix always false requirement by adding a cast
* Change data type to unsigned long long to align with cuda
* Change literal type to double to ensure proper type casting
* Remove formatting comments
2026-01-27 11:51:08 +01:00
Jatin Chaudhary c4a9567492 Simplify and remove stride based access of managed varaible test (#2677) 2026-01-27 10:48:49 +00:00
Jan Stephan 1b55de002a Fix path to roofline figure (#2718)
Signed-off-by: Jan Stephan <jan.stephan@amd.com>
Co-authored-by: Istvan Kiss <neon60@gmail.com>
2026-01-27 09:05:31 +01:00
Istvan Kiss 00132294f8 Update HIP programming guide images (#2794)
Update images of HIP  documentation
2026-01-27 09:04:36 +01:00
systems-assistant[bot] f05be9efb3 AICOMRCCL-82 AICOMRCCL-85 Switched MSCCLPP.cmake to use targets (#2774)
* Initial refactoring work, including using build targets, and settable MSCCLPP_ROOT, MSCCLPP_SOURCE, MSCCLPP_APPLY_PATCHES.

* Another large refactor of MSCCLPP cmake to make all portions targets with appropriate dependencies. This should include all paths to the final target: starting with a full mscclpp install, starting with custom mscclpp and/or json source code, or from submodules + optional patches.

* Update whitespace Findmscclpp_nccl_static.cmake

---------

Co-authored-by: Corey Derochie <corey.derochie@amd.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
2026-01-26 23:12:16 -07:00
Kian Cossettini 0eac446cb0 [rocprofiler-systems] - Implement subset of CTests into PyTests (#2666)
Convert a subset of the ctest to pytest to be used in TheRock CI.
Create a new cmake flag `ROCPROFSYS_INSTALL_TESTING` to control test suite installation.
- pytest package will be installed to share/rocprofiler-systems/tests
- all compiled examples are put in share/rocprofiler-systems/examples
- all test relevant scripts are put in share/rocprofiler-systems/tests
- see README.md in share/rocprofiler-systems/tests
2026-01-26 23:10:01 -05:00
Kapil S. Pawar 922762e9b9 Rename inspector plugin library (#2815) 2026-01-26 16:38:42 -07:00
vedithal-amd 9a3f0ef113 [rocprofiler-compute] Pin dependencies version in requirements-test.txt (#2861)
* Pin versions in requirements-test.txt

- Validated compatibility to version pins in requirements.txt
- Validated compatibility with pytest, ctest, automatic test suite
- Validated compatibility with Python 3.9, 3.10, 3.11, and 3.12.

* Remove unused mock dependency
2026-01-26 18:38:09 -05:00
marandje 5cda2a496e SWDEV-568260 - Validate sub-buffer coverage in hipMemSetAccess (#2451) 2026-01-26 23:09:46 +01:00
Jason Bonnell 1255ba2bcc rocprofiler-compute Docker Images in GHCR (#1195)
* Initial cleanup of compute workflows and skeleton of ghcr workflow

* Add containers-ci.yml, update opensuse and rhel dockerfiles

* rename id in rocprofiler-compute-ghcr.yml

* Add new line to end of containers-ci.yml

* Update action versions for rocprofiler-compute-ghcr.yml

* Switch back to SHA for action versions

* Add conda set solver classic fix to compute CI dockerfiles

* Update conda install for compute Dockerfiles

* Change opensuse version to 15.6 in containers-ci.yml

* Add fix for ubuntu noble to compute Dockerfile.ubuntu.ci

* Add default distro and version to Dockerfile.ubuntu.ci

* Updated regex for tarball version

* Remove Python3.8 from compute CI Dockerfiles

* Change RHEL 9.4 to 9, add retry for compute workflow

* Revert name change for compute rhel workflow

* update path naming

* Remove binutils-gold from Dockerfile.opensuse.ci

* Remove conda python installs from Dockerfile.ci files in compute

* Change CMake version to 3.21 in compute Dockerfile.ci files

* Update checkout actions from v4 to v5
2026-01-26 17:06:20 -05:00