نمودار کامیت

68527 کامیت‌ها

مولف SHA1 پیام تاریخ
cfallows-amd 29a7591791 Update RHEL8/9 workflow with latest rocm 7.1.1 links (#2060)
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
2025-12-01 11:14:33 -05:00
marantic-amd 3b11e01716 Perfetto traces from cached data (#1704)
## Motivation

The idea is to unify the way and place where we store our traces. Current implementation uses `trace_cache` for rocpd traces, but perfetto is in lined inside of each module. This change allows us to have a single point in code where we will collect data, process it and store it in the desired format. This means that we can declutter the code further and have single point of responsibility and single point of failure.

## Technical Details

New `processor` (perfetto_post_processing.cpp) is added to the `trace_cache` which purpose is to use the cached data to populate perfetto tracks. Cache manager is responsible for keeping the instance of this processor and for its lifetime.
2025-12-01 09:59:16 -05:00
Kian Cossettini b506c75f28 [rocprof-sys] Fix roctx wall clock tree, change timemory push/pop to use proper category, and add roctx as valid domain choice (#2062)
When doing this ticket, I also noticed the program would SEGFAULT when ROCPROFSYS_ROCM_DOMAINS=roctx even though the docs tell us we can do this. Went ahead and fixed that.

Also noticed that timemory push/pop in rocprofiler-sdk.cpp was always using category::rocm_marker_api instead of CategoryT. Fixed that as well.
2025-12-01 09:50:58 -05:00
Kian Cossettini ae29018bb0 [rocprofiler-systems] Enable HOST OMPVV runtime-instrumentation CTests (#1970)
* Enable HOST ompvv runtime-instrumentation ctests

* Fix rocprofiler-systems-avail-regex-negation test failure

* Exclude problematic function from instrumentation

* Make push pop skip an env option for ctests

* Remove SKIP_PUSH_POP_CHECK from argument parse

Co-authored-by: David Galiffi <David.Galiffi@amd.com>

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-12-01 09:26:24 -05:00
vstojilj 77f58ceb9f SWDEV-558557 - Remove duplicate nodes when capturing hipMemcpyAsync (#1226) 2025-12-01 11:25:13 +01:00
Milan Radosavljevic ee7305e795 [rocprof-sys] Add test cleanup fixtures for binary-rewrite and runtime-instrument tests (#2012)
- Added `binary-rewrite-cleanup` and `runtime-instrument-cleanup` tests that remove instrumented binaries and output directories using `cmake -E rm -rf`
- Implemented CMake test fixtures (`FIXTURES_SETUP` and `FIXTURES_CLEANUP`) to establish proper test ordering:
  - `binary-rewrite` sets up the `binary-rewrite-fixture`
  - `binary-rewrite-run` and validation tests require this fixture
  - `binary-rewrite-cleanup` performs cleanup for this fixture
  - Same pattern applied for `runtime-instrument`
- Extended `ROCPROFILER_SYSTEMS_ADD_PYTHON_TEST` to accept `FIXTURES_REQUIRED` parameter
- Updated validation tests to require appropriate cleanup fixtures based on test name pattern matching
- Added fixture requirements to Python code-coverage tests
2025-11-28 18:51:54 -05:00
abchoudh-amd fd61b0f507 Add CU Utilization and deprecate Active CUs (#1822)
* ChangeLog

* Deprecation notice in old arch

* Deprecation notice current arch

* New config hash

* Added Config deltas

* Added metric description
2025-11-28 11:32:25 -05:00
vedithal-amd 3f2fbc18e9 [rocprofiler-compute] Only depend on amdsmi in profile phase (#2044)
* Only depepnd on amdsmi in profile phase

* amdsmi interface tests should have common prefix for easier testing
2025-11-28 11:32:00 -05:00
Honglei Huang 68c8e111ae rocr/format: Fix clang-format-diff.py path in format script (#1757)
Update the format script to use absolute path for clang-format-diff.py
instead of relative path. This ensures the script works correctly
regardless of the current working directory when executed.

- Change from './clang-format-diff.py' to '${root}/projects/rocr-runtime/clang-format-diff.py'
- Improves script reliability and portability

Signed-off-by: Honglei Huang <honghuan@amd.com>
2025-11-28 09:21:12 +08:00
Honglei Huang aaa06e1609 libhsakmt/virtio: add non SVM mode in libhsakmt virtio driver and many fixes (#1756)
* libhsakmt/virtio: change shmem size to 80

Some DGPU props have a lot of information,
so it is necessary to increase the size of shmem.

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: use BO handle instead of pointer in memory registration

Change vhsakmt_map_to_gpu() return type from void* to vhsakmt_bo_handle
to properly handle buffer object information. This allows access to
both the host address and resource ID needed for memory registration.

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Improve memory mapping logic

- Update vhsakmt_mappable() to check NoAddress flag and require HostAccess
- Remove mappable checks in cpu_map/unmap to allow all BOs to be mapped
- Set BO flags properly in vhsakmt_alloc_memory and scratch memory creation
- Ensure scratch memory is correctly flagged for proper handling

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: add no svm mode for libhsakmt virtio

Add no svm mode for libhsakmt virtio driver, in no svm mode userptrs
need UMD to manage, so add interval tree to manage them.

New Features:
- Add augmented red-black tree based interval tree implementation
  * Implement RB-tree insertion, deletion, and color balancing
  * Provide interval query for fast overlapping range lookup
  * Based on Linux kernel's augmented rbtree implementation

- Improve userptr memory management
  * Use interval tree to efficiently track userptr memory regions
  * Support finding registered memory within given address ranges
  * Optimize memory mapping and unmapping performance

Signed-off-by: Honglei Huang <honghuan@amd.com>

---------

Signed-off-by: Honglei Huang <honghuan@amd.com>
2025-11-28 09:20:43 +08:00
Pratik Basyal 792ecc1a83 Formatting fixed (#1691) 2025-11-27 18:55:45 -05:00
Giovanni Lenzi Baraldi 0e04fdd571 Workaround for SWDEV-559598. Enabling more thread trace tests. (#1336)
* Workaround for SWDEV-559598

* gfx11 fix
2025-11-27 20:03:38 +01:00
vstojilj 259010f2d5 SWDEV-491253 - Create stream capture test for kernel APIs (#1189) 2025-11-27 17:40:11 +01:00
vstojilj 1c09c87cc7 SWDEV-564927 - Allow sizeBytes to be 0 when hipMemsetAsync is captured (#1849) 2025-11-27 17:13:33 +01:00
Kian Cossettini 63713f01e0 [rocprofiler-systems] Add Fortran MPI CTests (#1172)
* Add MPI CTests (use gfortran)

* Add proper regex check

* Skip Runtime-Instrument due to incompatibility with MPI

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Sajina Kandy <sputhala@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-27 10:32:09 -05:00
vedithal-amd 2e10041210 Fix sL1D values in memory chart (#2037) 2025-11-27 09:13:19 -05:00
Godavarthy Surya, Anusha 2e1c37a926 SWDEV-490861 - Remove recursion and extra loop in hipGraphLaunch (#1792) 2025-11-27 10:25:08 +00:00
Ammar ELWazir ed42157c31 Fixing Code Object Data Race and Thread Safety & Adding validation test (#2014) 2025-11-26 20:28:52 -06:00
Alysa Liu 5b75ec6a09 rocr: Fix error when internal signal is destroyed (#1845)
Fix error when we destroy internal signals during shutdown.
Fix init dependency on uninitialized value.
2025-11-26 16:22:57 -08:00
Ioannis Assiouras a598f9138b Fix flaky test Unit_hipStreamAddCallback_StrmSyncTiming (#2022) 2025-11-26 22:52:58 +00:00
Shadi Dashmiz 962b99f925 SWDEV-567514: Remove default stream wait (#1977)
- when virtual map command is called

- can create deadlock

Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
2025-11-26 15:11:52 -05:00
Kian Cossettini 76a23eab14 [rocprofiler-systems] Add support for ompt_callback_thread_begin (#1681)
* Add thread_begin callback

* Make OMPT callbacks that are instant have start_ts = end_ts
2025-11-26 13:38:04 -05:00
Rahul Manocha bc6f29c04a Fix and enable VMM tests on cuda (#1855)
* Fix and enable VMM tests on cuda

* Minor syntax fixes

---------

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2025-11-26 08:48:47 -08:00
AidanBeltonS d849b88aef SWDEV-558080 - Add recommended granularity (#1176)
* Add recommended granularity

* Improve granularity testing

* Update based on feedback
2025-11-26 16:10:58 +00:00
Adam Pryor 422253f871 Implement PTL support (#1957)
* Implement PTL support

Signed-off-by: adapryor <Adam.pryor@amd.com>
(cherry picked from commit 45bc31292e7940a3b8fca044ef7df22047b95733)

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

---------

Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-11-26 08:33:27 -06:00
Matt Arsenault f089217e6a SWDEV-548892 - Stop using ockl steadyctr function (#1882)
Directly use the builtin
2025-11-26 09:29:06 -05:00
amilanov-amd da9bb4efae SWDEV-503089 - Fix and enable disabled HIP tests from math group (#1319)
* SWDEV-503089 - Fix and enable disabled HIP tests from math group

* SWDEV-503089 - Move single precision reduced run to a common function
2025-11-26 10:34:05 +01:00
Todd tiantuo Li ee48f6221d SWDEV-562708 - change default maximum SVM size to 256GB (#1731) 2025-11-25 23:59:39 -08:00
Matt Arsenault 9fbb062505 SWDEV-548892 - Stop using ocml isinf wrapper (#1854) 2025-11-25 22:21:37 -05:00
Karthik Jayaprakash 740a06d567 SWDEV-559267 - Use CLPrint to DevLogPrintf with Log Level - detail debug. (#1160) 2025-11-25 19:25:32 -05:00
German Andryeyev 93682f2f75 SWDEV-567852 - Clean-up hip::init() (#1948) 2025-11-25 19:05:41 -05:00
cadolphe-amd cce94f6ee0 SWDEV-557412 - Incorporate proper chunk offset when remapping virtual memory (#1848)
* SWDEV-557412 - Incorporate proper offset when remapping virtual memory

* Fix condition to check if VMHeap allocation address matches a chunk address

* Move offset calculation outside if/else block

---------

Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
2025-11-25 18:05:25 -05:00
marantic-amd daf8596ce9 [rocprof-sys] Process all information regarding agents and store them as extdata in rocpd database (#1880)
## Motivation

Resolved: SWDEV-566226

The current implementation of agents inside of rocprof-systems keeps just the minimal necessary set of information required for populating the `info_agent` table inside of rocpd database. There is a sufficient amount of data that is being left out from database, so this change should fix that and store the additional agent information as an `extdata` row inside of `info_agent` table.

## Technical Details

This PR introduces additional filed inside of `agent` structure inside which is representing the JSON formatted string of all the additional information we can acquire about particular agent. This data is processed and added during the initial fetching of agents, and afterwards pushed inside of the database.

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-11-25 17:33:12 -05:00
itrowbri 304c2b82b0 Updated rocprofv3.py to ignore old attach duration msec value (#1980) 2025-11-25 16:30:54 -06:00
Victor Zhang ede71ca3b0 SWDEV-567829 - populateFormatStringHashMap: relax printf hash collisi… (#1944)
* SWDEV-567829 - populateFormatStringHashMap: relax printf hash collision check for duplicate format strings

* function optimized by ai
2025-11-25 17:19:27 -05:00
anujshuk-amd 85b5c03f36 [rocprof-sys] Fix test build failure on RHEL 10 (#1955)
## Motivation

To solve: SWDEV-566076 
FFmpeg versions >= 58.134 no longer expose read_seek and read_seek2 function pointers in AVInputFormat,
requiring alternative seek detection methods. This pull request updates the `VideoDemuxer` class to improve compatibility with newer versions of FFmpeg. The main change is how the code determines whether the input file is seekable, addressing differences in FFmpeg API versions.


## Technical Details

In `video_demuxer.h`, added a conditional check for `USE_AVCODEC_GREATER_THAN_58_134` to set `is_seekable_` to `true` for newer FFmpeg versions, since `read_seek` and `read_seek2` are no longer exposed in `AVFormatContext`. For older versions, the previous method of checking these fields remains in place. The conditional compilation
now assumes seek capability is available for newer FFmpeg versions.
2025-11-25 15:25:05 -05:00
Bindhiya Kanangot Balakrishnan e8c3b22734 [SWDEV-556483] Fix runtime PM suspend causing test failures (#1931)
Added runtime PM detection and DRM ioctl-based device wake
to handle GPUs in BACO state. Modified tests to wake
suspended devices before reading sysfs files.

---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-11-25 13:36:45 -06:00
usrihari123 47e53ec6f3 Update rocpd docs (#1276) 2025-11-25 22:33:12 +05:30
German Andryeyev 2c5754844f SWDEV-465041 - Enable direct dispatch under Linux by default. (#1934) 2025-11-25 11:30:32 -05:00
Victor Zhang 92fcc928b6 SWDEV-526773 - Modify LaunchDelayKernel to set a hard coded WallClock… (#1911)
* SWDEV-526773 - Modify LaunchDelayKernel to set a hard coded WallClock value when it's not avaliable

* Change hardcode clockrate in unit of KHz.
2025-11-25 11:21:03 -05:00
Ethan Trinh 2042191e23 Suppress deprecated-declaration warnings (#1817)
Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
2025-11-25 10:31:30 -05:00
Ethan Trinh bef946de1c SWDEV-555551 - Remove hip-test warnings in linux (#1031)
Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
2025-11-25 10:31:15 -05:00
Jason Bonnell e68873c170 Gersemi formatting for rocprofiler-compute (#1997)
* Run gersemi formatting on cmake files in compute

* Run gersemi again but on updated version
2025-11-25 09:49:16 -05:00
Gerardo Hernandez 8abfee9f26 SWDEV-541351 - fix use of uninitialized memory in Unit___hip_atomic_compare_exchange tests (#1976) 2025-11-25 11:02:14 +00:00
solaiys 3466ec5458 Added PCIE Atomic Operations enable check. (#1746)
* Added PCIE Atomic Operations enable check.

Tests if atomic operations are enabled for GPU devices.
Displays the Atomic routing capability via Link capability and status.

Signed-off-by: Saravanan Solaiyappan <saravanan.solaiyappan@amd.com>
2025-11-25 14:29:30 +05:30
Gerardo Hernandez c87014a54c SWDEV-534207 fix order of kernel launch parameters when calling notifiedKernel in some tests: kernel<<<gridDim, blockDim>>> instead of kernel<<<blockDim, gridDim>>>. This was causing out of bounds accesses (#1860) 2025-11-25 06:37:47 +00:00
Pengda Xie 6c31785eaf SWDEV-562761 - Cleanup static fatbin on runtime teardown (#1873) 2025-11-24 21:57:46 -08:00
darren-amd 16e7ee32e6 [rocm-smi-lib] Add iomanip include to frequencies_read (#1797) 2025-11-24 16:38:21 -05:00
Young Hui - AMD a4f533fa92 [rocpd] Fix rocpd convenience scripts to accept --automerge-limit parameter (#1926)
* remove double RocpdImportData calls from execute() in each module

* formatting fix
2025-11-24 14:50:27 -05:00
Maisam Arif 1f7fc8d8a7 Fixed wrapper to respect symlink pathing (#1984)
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-11-24 13:14:46 -06:00