231 Commits

Autor SHA1 Mensagem Data
ammallya aa840563a7 Migration of rocdecode and rocjpeg complete (#2998) 2026-01-30 14:37:28 -08:00
Venkateshwar Reddy Kandula dea3da3a6f [rocprofiler-sdk][CI] Fix rocprofiler-sdk CI change rocm version to 7.2.0 (#2979)
* Update aqlprofile-continuous_integration.yml to use rocm-7.2.0

* Update rocprofiler-sdk-continuous_integration.yml to use rocm-7.2.0

* Update rocprofiler-sdk-docs.yml to use rocm-7.2.0
2026-01-29 17:28:52 -06:00
Venkateshwar Reddy Kandula a7c3e8392a [rocprofiler-sdk] Use venv for fixing CI docker image workflow (#2955)
* use python virtual env for aws cli

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* use 7.2 amdgpu for ubuntu

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-29 09:53:15 -05:00
Copilot 14f9f2537a Add artifact upload steps to AMDSMI CI workflow for PR builds (#2936) 2026-01-28 18:14:47 -05:00
David Yat Sin 99d88827fb Update CODEOWNERS for ROCR-Runtime (#2790) 2026-01-28 15:53:23 -05:00
Jason Bonnell d917259953 Add --verbose to ctest to get more output (#2928) 2026-01-28 22:43:14 +05:30
Benjamin Welton 1517a398bf [rocprofiler-sdk] Buffer finalization fixes and HSA ABI 0x09 support (#2318)
* [rocprofiler-sdk] Fix buffer flush ordering and sanitizer CI improvements

Buffer Pool Design
------------------
Replace the fixed array-based double buffer with a dynamic pool design to
fix race conditions that caused "internal correlation id was retired
prematurely" errors.

The original design had a race where flush callbacks could be delivered
out-of-order: when buffer 0 fills and begins flushing, writes go to
buffer 1. If buffer 1 fills before buffer 0's flush completes, the
buffer index wraps back to 0 (which may still be flushing). Independent
flush tasks submitted to the thread pool can complete out of order.

The new pool design:
- Uses a std::deque of buffer instances that grows as needed
- Allocates buffers from the pool when the current buffer needs to flush
- Serializes flushes with a mutex to ensure FIFO callback ordering
- Returns buffers to the pool after flush completion
- Eliminates the race between buffer selection and write operations

New Unit Tests
--------------
- buffer_correlation_ordering.cpp: Tests that API records are always
  delivered before their corresponding retirement records
- buffer_ordering_stress.cpp: Stress tests buffer flush ordering under
  high contention with multiple threads rapidly filling buffers

HSA Tool Hooks
--------------
Added hsa_tool_hooks.cpp/hpp to register an HSA OnUnload callback that
waits for pending flush tasks before tool finalization, preventing
"retired prematurely" errors during HSA shutdown.

Sanitizer Improvements
----------------------
- LSAN: Set fast_unwind_on_malloc=1 to prevent deadlock in libgcc unwinder
- LSAN: Added suppressions for external tools (liblzma, liblsan, seq, strdup)
- TSAN: Added suppression for false positive on C++11 thread-safe static
  initialization in create_write_functor
- ASAN/UBSAN: Added patterns for known issues in HSA runtime, HIP, perfetto
- Disabled attachment tests for sanitizers due to library preloading issues

Other Fixes
-----------
- Thread-trace agent test: Use heap-allocated callback state
- Correlation ID: Refactored reference counting and finalization ordering

* [rocprofiler-sdk] Revert buffer pool design changes

Revert buffer.cpp and buffer.hpp to the original double-buffer
design from develop branch. The pool-based redesign introduced
concerns about:
- Signal safety (mutex vs atomic_flag)
- API changes (flush() return type)
- Complexity of the new design

This revert removes:
- Dynamic buffer pool with std::deque
- std::mutex/condition_variable synchronization
- buffer_correlation_ordering.cpp test
- buffer_ordering_stress.cpp test

The underlying buffer flush ordering issue will need to be
addressed with a different approach that preserves the original
API and synchronization characteristics.

* [rocprofiler-sdk] Consistent fini_status checks to prevent correlation ID creation during finalization

- Revert TOCTOU CAS loop change in sub_ref_count() - not needed with consistent checks
- Add fini_status check in correlation_tracing_service::construct() with ROCP_CI_LOG warning
- Add nullptr checks at all construct() call sites (queue.cpp, async_copy.cpp, memory_allocation.cpp)
- Change all 'get_fini_status() > 0' to '!= 0' for consistent behavior:
  - hsa/queue.cpp (lines 105, 210)
  - hsa/async_copy.cpp (line 344)
  - hsa/hsa_barrier.cpp (line 43)
  - buffer.cpp (lines 107, 138, 185)

This ensures no correlation IDs are created once finalization starts (fini_status != 0),
preventing races between finalization and ongoing tracing operations.

* [rocprofiler-sdk] Replace arrival-order checks with timestamp-based temporal validation

Buffer records are not guaranteed to arrive in any specific order. Tests and
samples should use timestamps for temporal ordering validation instead.

Changes:
- samples/external_correlation_id_request: Replace 'retired prematurely' arrival
  order check with timestamp-based validation that retirement timestamp >=
  max(end_timestamps) for records with the same correlation ID
- tests/external_correlation.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
- tests/registration.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
- tests/roctx.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check

Correlation IDs are not guaranteed to be monotonically increasing when records
are sorted by timestamp. Temporal ordering should be validated using the
timestamp fields in each record.

* [rocprofiler-sdk] Revert external/CMakeLists.txt SYSTEM keyword removal

Restore the SYSTEM keyword to target_include_directories for
rocprofiler-sdk-fmt to match develop branch.

* [rccl] Remove orphaned rocSHMEM gitlink

Remove orphaned submodule reference that was introduced during a merge
but never had a corresponding .gitmodules entry, causing CI failures
with "fatal: no submodule mapping found in .gitmodules".

* [rocprofiler-sdk] Add HSA ABI version 0x09 support

Add ABI checks for HSA_AMD_EXT_API_TABLE_STEP_VERSION 0x09 which
introduces hsa_amd_counted_queue_acquire and hsa_amd_counted_queue_release
functions (added in rocr-runtime SWDEV-561708).

* [rocprofiler-sdk] Handle finalized status gracefully in buffer flush operations

This commit consolidates fixes for handling the finalization status during
buffer flush operations across the SDK.

Changes:
- Tool and samples: Handle ROCPROFILER_STATUS_ERROR_FINALIZED gracefully
  when flushing buffers, as this indicates buffers were already flushed
  during finalization (not an error condition)
- HSA handlers (queue.cpp, async_copy.cpp, hsa_barrier.cpp): Use > 0 check
  for fini_status to allow operations during finalization process
- buffer.cpp: Revert fini_status checks to use > 0 for consistency
- correlation_id.cpp: Add fini_status > 0 check with ROCP_TRACE logging
  to prevent correlation ID creation after finalization starts

Files modified:
- source/lib/rocprofiler-sdk-tool/tool.cpp
- tests/tools/json-tool.cpp
- source/lib/rocprofiler-sdk/tests/registration.cpp
- source/lib/rocprofiler-sdk/tests/roctx.cpp
- samples/api_buffered_tracing/client.cpp
- samples/counter_collection/buffered_client.cpp
- samples/counter_collection/device_counting_async_client.cpp
- samples/external_correlation_id_request/client.cpp
- samples/pc_sampling/client.cpp
- source/lib/rocprofiler-sdk/buffer.cpp
- source/lib/rocprofiler-sdk/context/correlation_id.cpp
- source/lib/rocprofiler-sdk/hsa/queue.cpp
- source/lib/rocprofiler-sdk/hsa/async_copy.cpp
- source/lib/rocprofiler-sdk/hsa/hsa_barrier.cpp

* [rocprofiler-sdk] Remove hsa_tool_hooks and simplify buffer flush handling

Remove the hsa_tool_hooks infrastructure and simplify buffer flush calls
in samples and tools. The ERROR_FINALIZED handling was overly complex
and the hsa_tool_hooks OnUnload synchronization is no longer needed.

Changes:
- Remove hsa_tool_hooks.cpp/hpp and related registration.cpp code
- Simplify buffer flush calls in samples to use direct ROCPROFILER_CALL
- Simplify buffer flush in tool.cpp and json-tool.cpp
- Remove ERROR_FINALIZED special handling from test files

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Fix output_stream move semantics to null source pointers

The default move constructor and move assignment operator for
output_stream did not null out the source's pointers after the move.
This caused double-close when the moved-from temporary was destroyed,
leading to use-after-free crashes (SIGSEGV in std::ostream::sentry).

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Improve Perfetto trace writer and sanitizer configuration

- generatePerfetto.cpp: Move output_stream into shared_state to prevent
  use-after-free race conditions during Perfetto callback execution
- run-ci.py: Simplify and consolidate sanitizer environment variable
  configuration for better maintainability

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Revert run-ci.py changes that broke sanitizer suppressions

The previous changes removed MEMCHECK_SANITIZER_OPTIONS which is required
for CTest to properly pass suppression files to the sanitizers during
memcheck runs.

Co-Authored-By: Claude <noreply@anthropic.com>

* Revert "[rccl] Remove orphaned rocSHMEM gitlink"

This reverts commit 1ad21003941355658fff8114fa27768f11a948f7.

* [rocprofiler-sdk] Revert registration.cpp changes

Revert changes to registration.cpp to match develop branch.

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Remove suppression file content printing from run-ci.py

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix output_stream move ctor/assignment operator

* Fix erroneous revert of registration.cpp

* Fix handling of fini status in correlation ID construction

* [rocprofiler-sdk] Fix OMPT segfault during finalization

Add nullptr checks in OMPT tracing code to handle the case where
correlation_tracing_service::construct() returns nullptr during
finalization. This fixes segfaults in openmp-target-sample and
tests.integration.execute.openmp-tools.

The correlation ID construction now returns nullptr when fini_status > 0,
but the OMPT callbacks were not checking for this, causing crashes when
dereferencing the null pointer during OpenMP runtime shutdown.

Changes:
- event_common(): Return nullptr early if correlation ID is null
- event(): Check for nullptr before calling sub_ref_count()
- ompt_task_create_callback(): Return early if correlation ID is null
- ompt_task_schedule_callback(): Return early if correlation ID is null

* [rocprofiler-sdk] Fix HSA API tracing segfault during finalization

Add nullptr check in hsa_api_impl::functor after correlation ID
construction. During finalization, correlation_service::construct()
returns nullptr, and without this check the code would dereference
the null pointer when accessing corr_id->internal.

This fixes the SEGV at address 0x000000000008 (null + 8 byte offset)
that occurs when HSA async event threads call hsa_signal_destroy
during runtime shutdown after finalization has started.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2026-01-27 13:27:54 -05:00
Jason Bonnell 1255ba2bcc rocprofiler-compute Docker Images in GHCR (#1195)
* Initial cleanup of compute workflows and skeleton of ghcr workflow

* Add containers-ci.yml, update opensuse and rhel dockerfiles

* rename id in rocprofiler-compute-ghcr.yml

* Add new line to end of containers-ci.yml

* Update action versions for rocprofiler-compute-ghcr.yml

* Switch back to SHA for action versions

* Add conda set solver classic fix to compute CI dockerfiles

* Update conda install for compute Dockerfiles

* Change opensuse version to 15.6 in containers-ci.yml

* Add fix for ubuntu noble to compute Dockerfile.ubuntu.ci

* Add default distro and version to Dockerfile.ubuntu.ci

* Updated regex for tarball version

* Remove Python3.8 from compute CI Dockerfiles

* Change RHEL 9.4 to 9, add retry for compute workflow

* Revert name change for compute rhel workflow

* update path naming

* Remove binutils-gold from Dockerfile.opensuse.ci

* Remove conda python installs from Dockerfile.ci files in compute

* Change CMake version to 3.21 in compute Dockerfile.ci files

* Update checkout actions from v4 to v5
2026-01-26 17:06:20 -05:00
Jason Bonnell ebc1980768 Add ROCM_VERSION env var, update repo URLs (#2870) 2026-01-26 16:00:16 -05:00
Aravind Ravikumar cc1a7c3c82 Enable CI to work on release branches (#2571)
* Commit to enable CI for release branches

* Pin back to specific SHA in therock

---------

Co-authored-by: Aravind Ravikumar <arravikum@amd.com>
2026-01-26 14:50:05 -05:00
jamessiddeley-amd dbd26a88b4 [rocprof-compute] Fix silent failures in Continuous-Integration CI workflow (#2797)
* fix silent failures in rocprof-compute continuous-integratin CI workflow

* CDash uploads complete before the script fails
2026-01-26 12:16:17 -05:00
Nilesh M Negi cba2e3de2f Add rccl-reviewers to CODEOWNERS file (#2816)
* Add rccl-reviewers to CODEOWNERS file
* Re-order lines based on copilot suggestion
2026-01-22 16:29:48 -05:00
akolliasAMD 9caa248bfd added rocshmem-reviewers on CODEOWNERS file (#2809) 2026-01-22 14:12:56 -05:00
ammallya ea94716e23 Migrating rccl and rccl-tests (#2750)
* Migrating rccl and rccl-tests

* Adding missing submodules for rccl
2026-01-21 18:16:19 -08:00
ammallya 1dfc679821 Migration of rocshmem (#2742) 2026-01-21 13:31:26 -08:00
ammallya 7cab5ea514 Add rocshmem to labeler (#2724) 2026-01-21 12:47:46 -08:00
JeniferC99 50e00d1b94 Update CODEOWNERS (#2705): add /project/amdsmi owner 2026-01-20 15:49:59 -08:00
jamessiddeley-amd 25090e003f [rocprof-compute] Pin ruff version for consistent formatting (#2680)
* pin ruff versions each to current latest

* Update rocprofiler-compute-formatting.yml

* Downgrade .pre-commit-config.yaml to match develop
2026-01-19 19:10:02 -05:00
Gopesh Bhardwaj b18db05091 [rocprofiler-sdk] Fixing docs build (#2608) 2026-01-14 10:13:17 -05:00
dsclear-amd d5f490fa2f Sets heavy GitHub CI workflows to not trigger on text documentation-only changes. (#2417)
Sets heavy GitHub CI workflows to not trigger on docs-only changes.

Specifically, sets azure-ci-dispatcher.yml and therock-ci.yml, as well as many rocprofiler workflows, to not trigger when the change consists entirely of docs-only files.
2026-01-12 18:31:30 -05:00
Jason Bonnell 95a31b10cd Fix aqlprofile-continuous_integration.yml workflow (#2582)
* Fix typo in matrix definition for aqlprofile-continuous_integration.yml

* Update ROCM_VERSION to 7.1.1

* Minor changes to core-rpm step

* Add working-directory to test steps

* Revert changes

* Add set -v to rpm test step

* Remove Python venv line from rpm test step
2026-01-12 15:53:04 -05:00
Mythreya Kuricheti 36d9d33d90 Users/mkuriche/rocprofiler sdk fmt build fix memory header (#2537)
* [rocprofiler-sdk] Fix fmt::join build errors

- remedy use of fmt::join without include <fmt/ranges.h>

* include memory header

* Disable FMT build for SDK CI

* Add -DROCPROFILER_BUILD_FMT=OFF to sanitizer steps

* Add temporary workaround for rccl.h issue

* Add ROCPROFILER_INTERNAL_RCCL_API_TRACE to SDK CI builds

* disable clang-tidy for vendored includes

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: jbonnell-amd <jason.bonnell@amd.com>
2026-01-12 12:59:47 -05:00
Jason Bonnell 788bcdddd0 Update SDK Dockerfile.ci for Ubuntu (#2539)
* Add verbose output for submods step

* Remove git config setting

* Determine git version

* Try different git install

* Update Dockerfile.ci

* Revert git location in Ubuntu jobs

* Update RHEL and SLES sections to use 2.52 as well

* Add git --version to each step, fix typo in SLES Docker
2026-01-09 12:10:36 -05:00
David Yat Sin 7178747ebc Update CODEOWNERS for ROCR-Runtime (#2521) 2026-01-07 14:22:11 -05:00
Jason Bonnell 1d5a6e9bfe Update rocprofiler workflows to use new mi325 runner names (#2467)
* Update rocprofiler workflows to use new runner naming for mi325

* Add input options to workflow_dispatch for rocprofiler-systems CI workflow

* Update runner name on therock-ci-linux.yml as well
2026-01-05 15:41:01 -05:00
Joseph Macaranas 11d9472e5f Bump TheRock SHA for CI 20251230 (#2466)
* Bump TheRock SHA for CI 20251230
* Remove patch and align workflows between OS
2026-01-05 13:00:37 -05:00
ammallya c2c4d4c1f5 Revert "Adding full build capability to theROCK for HIP changes (#2003)" (#2441)
This reverts commit 0a52f5c101.

Reverts #2003

MIOpen build failures on windows causing blockers on unrelated file changes.
2025-12-23 13:01:08 -08:00
Ammar ELWazir 1f8e8e3fbf Add CODEOWNERS for rocprofiler-sdk project (#2427)
## Motivation

Missing CODEOWNERS for ROCProfiler-SDK

<!-- Explain the purpose of this PR and the goals it aims to achieve. -->

## Technical Details

Add CODEOWNERS for rocprofiler-sdk project

<!-- Explain the changes along with any relevant GitHub links. -->

## JIRA ID

<!-- If applicable, mention the JIRA ID resolved by this PR (Example: Resolves SWDEV-12345). -->
<!-- Do not post any JIRA links here. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [X] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2025-12-22 12:16:09 -05:00
ammallya 0a52f5c101 Adding full build capability to theROCK for HIP changes (#2003)
## Add Full Build Capability to theROCK for HIP

### Summary
This PR adds full build support to **theROCK** for HIP-related changes, ensuring that all components are built.

### Changes
- Enabled full build coverage for the following projects:
  - `projects/clr`
  - `projects/hip`
  - `projects/hip-tests`
  - `projects/rocr-runtime`
- Updated build configuration to include all targets for the above projects.
- Ensured rocm-libraries is pulled to build optional components.

### Motivation
These changes are required to support HIP development and testing within theROCK by ensuring all components are built together. This improves reliability, integration testing.
2025-12-22 05:31:32 -08:00
Geo Min 3635953cd8 Revert "Adding org var and dynamic selection of targets (#2317)" (#2416)
This reverts commit c9ac018395.
2025-12-19 14:51:53 -08:00
Jason Bonnell 112b4fd413 [rocprofiler-compute] Add SDK dependency to rocprofiler-compute-tarball.yml workflow (#2329)
* Install rocm-dev in rocprofiler-compute-tarball.yml workflow

* Update paths for push and PR for rocprofiler-compute-tarball.yml

* Add ROCm dependencies to disttest job

* cmake fix binary link creation and fix format

* Use python3 instead of python3.9 in RHEL 8 and RHEL 9 workflows

* set default python3 to python3.9 in rhel8

* Try alternatives setup for python3 in RHEL8 env

* Add pip install cmake to debug RHEL8 issue

* Remove python3.11 in RHEL8 workflow

* Add back comment regarding RHEL8

---------

Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
2025-12-18 11:56:23 -05:00
amd-juwillia 3a3738ad98 Added AMDSMI CI to rocm-systems(#2074)
Signed-off-by: Justin Williams <Justin.Williams@amd.com>
2025-12-16 13:52:42 -06:00
Geo Min c9ac018395 Adding org var and dynamic selection of targets (#2317) 2025-12-16 10:46:59 -08:00
systems-assistant[bot] b002c6a739 SWDEV-538607 - Add SIMDe as a build dependency, remove naked intrinsic use. (#500)
Co-authored-by: Alex Voicu <alexandru.voicu@amd.com>
Co-authored-by: Ioannis Assiouras <Ioannis.Assiouras@amd.com>
2025-12-15 17:40:51 +00:00
Ioannis Assiouras 13561fb8cd Bump hash for theRock to 2025-12-12 commit (#2289) 2025-12-13 23:56:08 +00:00
jamessiddeley-amd 81720183ad [rocprof-compute] Merge CDash Nightly and Continuous workflow files (#2279)
* merged code-coverage and continuous workflow files

* fixed runner typos and added build mode

* add actor name to Continuous build

* improve error handling and remove redundant verbose

* fixed workflow file log output

* revert logs output in run_ci.py

* ruff format
2025-12-12 17:04:56 -05:00
Eiden Yoshida a9de523e0d Add rccl and rccl-tests to auto-labeler yaml (#2286) 2025-12-12 12:47:20 -07:00
Dominic Widdows 2073cf2172 Skip running TheRock CI on docs-only changes(#2246)
Following the pattern from ROCm/rocm-libraries#2679, add logic to skip
CI builds when only documentation files are modified.

Changes:
- Add SKIPPABLE_PATH_PATTERNS for docs, markdown, and .gitignore files
- Return empty projects list when only skippable paths are modified
- No workflow changes needed - existing projects != '[]' check handles it
- Add unit tests for doc-filtering logic
- Fix existing tests with proper subprocess mocking

Reference: https://github.com/ROCm/rocm-libraries/pull/2679
2025-12-12 08:30:59 -08:00
David Galiffi fbaeb74107 [rocprof-sys] Update nightly CI workflow (#2263)
Update ROCm version to 7.1.0

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2025-12-11 12:23:26 -05:00
Mario Limonciello 0c4d08f38d Revert correcting the VGPR size for GFX 11.5.1 (#2268)
Although the value is correct; there is no source of truth between
kernel and userspace.  This leads to problems if the kernel has strict
restrictions (such as kernel 6.17 or earlier). The restrictions were
lifted in 6.17.9 and and 6.18, but there is no guarantee userspace is
using this.

So short term this value will be wrong.  But on newer kernels the kernel
will communicate the right size and rocr-runtime will be adjusted to
use that.

Link: https://github.com/ROCm/TheRock/pull/2505

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-12-11 07:59:19 -06:00
David Galiffi 70562eb854 Add ROCm 7.1 to workflows (#2256)
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2025-12-10 13:41:22 -05:00
Geo Min 879d010974 Bumping commit hash for TheRock (#2244) 2025-12-09 14:56:20 -08:00
Jason Bonnell 74cf2a85ec Fix rocprofiler-sdk CI workflow tarball install errors (#2225)
* Add ls statement for debugging /opt directory file naming

* Update ROCM_VERSION from 7.0.0 to 7.1.1 in SDK CI

* Update amdgpu debian package for Ubuntu in Dockerfile.ci

* disable HIP/CLR build in codeql (#2242)

---------

Co-authored-by: Venkateshwar Reddy Kandula <Venkateshwarreddy.Kandula@amd.com>
2025-12-09 16:06:35 -05:00
Mario Limonciello 6a899b5f6d Run pre-commit's whitespace related hooks on .github and .azuredevops (#2129)
In order for pre-commit to be useful, everything needs to meet a common
baseline.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-12-08 14:39:42 -06:00
Ioannis Assiouras a101df369c Bumping theROCK submodule 2025-12-04 commit (#2167)
* Bumping theROCK submodule 2025-12-04 commit

* Update container image in therock-ci-linux.yml

---------

Co-authored-by: jbonnell-amd <jason.bonnell@amd.com>
2025-12-05 18:33:32 +00:00
Jason Bonnell 3b875cc0ee [rocprofiler-compute] Add Nightly and CI on MI355/MI325 Runners (#1455)
* Initial work in progress for compute CI workflow

* Update run-ci.py script location, enable test creation

* Add new lines to files

* Add coverage file argument to run-ci.py

* Remove run-ci.py script usage from rocprofiler-compute-continuous-integration.yml workflow

* Add --break-system-packages parameter

* Add --ignore-installed to pip install

* Checkout specific branch until amdclang issue fixed in develop

* Add missing slash to path for cxx compiler

* Remove specific branch from checkout action

* Use run-ci.py in rocprofiler-compute-continuous-integration.yml

* Update install python requirements step

* Fix typo in build-name

* Update run-ci.py to have toggle for code coverage

* Apply ruff formatting

* Ruff again

* Exclude live attach detach and roofline tests in CI

* Add ctest args

* Revert run-ci.py changes

* Try new run-ci-2.py

* Update type of pytest-numprocs argument

* Try casting arg to str

* Fix typo in arg reference

* upgrade pip before running python installs

* Use jammy instead of noble for CI

* Remove python nproc arg from run-ci-2.py

* Switch to MI325 runners for CI

* Fix spacing issue

* Rename run-ci.py to run-code-coverage.py, add new run-ci.py

* Update to ROCm version 7.1.0 to debug sdk issues

* Testing out tarball install again

* Update regex on tarball version

* Update tarball regex on compute

* ruff formatting

* Revert change to systems CI file

* Switch back to rocm-dev install

* ruff formatting again

* Add ld_lib_path for rocm_sysdeps

* Remove excluded tests temporarily

* Add back excluded tests, add timeout for test step

* Address PR feedback

* Add git safe directory lines

* Revert dependencies change to debug new failures

* Exclude roofline again, rework dependencies

* Add in hip-runtime-amd dependency

* Install hip dev package

* Add TEST_FROM_INSTALL cmake arg to compute CI workflow

* Remove test_from_install for now

* Enable roofline tests again
2025-12-05 11:43:47 -05:00
Jason Bonnell 463126770a Update build docker container workflow, opensuse dockerfiles (#1883)
## Motivation

<!-- Explain the purpose of this PR and the goals it aims to achieve. -->

- __Reduced Code Duplication__: Version parsing logic moved from individual Dockerfiles to the central build script
- __Improved Edge Case Handling__: Better handling of ROCm versions with and without patch numbers (e.g., `6.2` vs `6.2.0`)
- __Easier Maintenance__: Future version-related changes only need to be made in one place
- __Cleaner Dockerfiles__: Simplified Dockerfiles focus on package installation rather than complex shell logic
- __Updated Platform Support__: Refreshed container matrix to reflect current platform/ROCm version combinations
- __Fix OpenSUSE Docker Generation__: OpenSUSE container generation fails due to a change to the `binutils-gold` package
- __Error Handling__: Fix bug where errors in docker image build were being masked, allowing workflow to pass anyway.


## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->
- Updated `Dockerfile.opensuse` and `Dockerfile.opensuse.ci` docker files to remove `binutils-gold`
  - Not needed since we build `binutils` with systems anyways
- Updated `rocprofiler-systems-containers.yml` to remove `pushd/popd` commands and just run the shell scripts
  - There was a silent failure observed here, which I verified in this PR before adding the fix for openSUSE
- Refactor ROCm version parsing. Move this logic to the `build-docker.sh` script to reduce duplication.
  - Fix bug that caused ROCm 7.0 to fail installation. The trailing `.0` was being trimmed.
- Fixed inconsistencies in `containers.yml` that lead to invalid ROCm-OS_VERSION combinations.
- Formatting fixes 
  - Removed trailing whitespace
  - Fix docker build warnings. Use an `=` rather than ` ` when assigning an environment variable.
2025-12-04 23:33:15 -05:00
Mythreya Kuricheti 3e5749ea59 [rocprofiler-sdk][CI] Update codeql rocm version (#1909)
* [rocprofiler-sdk][CI] Update codeql rocm version

* Build HIP from source
2025-12-02 12:17:59 -06:00
vedithal-amd 5bbb72f516 Add JIRA ID to pull request template (#2099) 2025-12-02 10:14:27 -05:00
cfallows-amd 29a7591791 Update RHEL8/9 workflow with latest rocm 7.1.1 links (#2060)
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
2025-12-01 11:14:33 -05:00