提交图

10781 次代码提交

作者 SHA1 备注 提交日期
systems-assistant[bot] 7e0f02eec6 Merge commit '190562e8c65c9872c5c22391400931da8e4b5dae' into develop 2025-07-30 15:21:36 +00:00
systems-assistant[bot] c59ab2f572 Merge commit '1ba08cd4dfb7fe99a51765019210947dfcd199f7' into develop 2025-07-30 15:21:36 +00:00
systems-assistant[bot] 50286202d5 Merge commit '56d040156e1d18479295536f8c4ada37ce34932d' into develop 2025-07-30 15:21:35 +00:00
systems-assistant[bot] 1df639fbb2 Merge commit '8f3a2326136caefa935876155eddb61177ad362c' into develop 2025-07-30 15:21:33 +00:00
systems-assistant[bot] a7cb68e38d Merge commit '51c5343bf891848443cab2230615fdb287e3b918' into develop 2025-07-30 15:21:32 +00:00
Joseph Macaranas a1568172c9 Setting up json for syncs from individual repos 2025-07-30 11:15:21 -04:00
fxmarty-amd 56d040156e bugfix to make amd-smi usage backward compatible (#836)
* Update soc_base.py

Fixes https://github.com/ROCm/rocprofiler-compute/issues/835

Signed-off-by: fxmarty-amd <felmarty@amd.com>

* address comments

---------

Signed-off-by: fxmarty-amd <felmarty@amd.com>
2025-07-30 09:40:04 -04:00
Baraldi, Giovanni 1ba08cd4df Removing ATT buffer size limitation (#534)
* Removing SQTT buffer size limitation

* Update source/lib/rocprofiler-sdk/thread_trace/core.cpp

* Added testing for buffer size. Formatting.

* Add test as unstable

* Increase default buffer size

* Apply suggestions from code review

Co-authored-by: Indic, Vladimir <Vladimir.Indic@amd.com>

* Fix typo from code review

* Update tests/thread-trace/agent.cpp

---------

Co-authored-by: Giovanni <gbaraldi@amd.com>
Co-authored-by: Indic, Vladimir <Vladimir.Indic@amd.com>
2025-07-29 22:47:40 +02:00
Indic, Vladimir 2d8936362e PCS test: cast agent name to str (#546)
* PCS test: cast agent name to str
2025-07-29 12:11:15 -07:00
David Galiffi 190562e8c6 Update VERSION to 1.2.0 (#299)
Bump version now that `release/rocm-rel-7.0` has been created.
2025-07-29 14:04:48 -04:00
David Galiffi 8ad2aa55f2 Update VERSION to 3.3.0 (#838)
Bumping version now that `release/rocm-rel-7.0` has been created

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2025-07-29 13:02:20 -04:00
Shweta Khatri a5de07d1b8 rocr: Remove ISA check to disable stochastic support for GFX12.0 in ROCR
Feature support should be determined by KFD via the query-capabilities
IOCTL, not in ROCR.
2025-07-29 11:18:36 -04:00
David Galiffi de6120daf9 Fix avail-regex-negation ctest (#298)
Adjusted the regex to filter out new "PAGE*" domains added by the
SDK. This was causing the passing regex to fail.

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2025-07-29 02:44:40 -04:00
Xiaogang Chen 996e8bbfb7 hsakmt: Use udmabuf to allocate system memory
This patch uses udmabuf driver to allocate system memory instead of using amdgpu
driver for APU. With this function app can account its consumed system memory by
cgroup mechanism. This function is enabled by env variable HSA_USE_UDMABUF.

Signed-off-by: Xiaogang Chen<Xiaogang.Chen@amd.com>
2025-07-28 14:11:17 -07:00
vedithal-amd 80ea339217 Fix test cases (#837)
* Fix formatting
2025-07-28 16:19:45 -04:00
Aleksandar Djordjevic 26ae543012 ROCpd support [Part 1] (#279)
- Add rocpd support for
 - cpu_frequency
 - amd_smi
 - sampling
2025-07-28 11:33:52 -04:00
vedithal-amd 03d27c0ba0 Enable rocpd output format with rocprofiler sdk (#790)
* Add `rocpd` choice for `--format-rocprof-output` option
* Add rocpd_data.py which defines SQL queries to extract data from rocpd database
* Use sqlite3 package to read the database
* Add `--retain-rocpd-output` option in profile mode to retain raw
  rocpd database
* Add warning notice to say `--format-rocprof-output rocpd` will be
  default in future release

For rocpd output:
* Use only `pmc_perf.csv` instead of reading individual coll_level results csv files
* Post process csv files using pandas in analysis mode instead of profile mode
* Use ACCUM counters instead of SQ_ACCUM_PREV_HIRES

* Add test cases for rocpd output format
* Fix code formatting issues
* Update CHANGELOG
2025-07-28 11:02:28 -04:00
vedithal-amd 6885cb068d add description for MI100 counters (#834) 2025-07-26 15:33:23 -04:00
Yiannis Papadopoulos b7cd5cc7f1 rocr: Adding conversion function from hsa_amd_vmem_alloc_handle_t to ThunkHandle 2025-07-26 00:55:21 -04:00
Yiannis Papadopoulos f5120bfe68 rocr: DmaBufExport support for other agent types 2025-07-25 21:49:35 -04:00
Yiannis Papadopoulos ccaac9045b rocr/aie: XdnaDriver::ExportDMABuf implementation 2025-07-25 21:49:35 -04:00
Yat Sin, David 0dec2ab43b Update runtime/hsa-runtime/core/runtime/amd_blit_sdma.cpp
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Yat Sin, David <David.YatSin@amd.com>
2025-07-25 14:50:40 -04:00
David Yat Sin d3f70910e1 rocr: Remove SDMA code for gfx7 and gfx8
Remove deprecated SDMA code for gfx7 and gfx8 asics
2025-07-25 14:50:40 -04:00
vedithal-amd bb44e90b2d Unified configuration for metrics (#726)
* Show description of metrics during analysis
    * Use --include-cols Description show the Description column in analyze mode (this is hidden by default)
    * Remove tips field from analysis config

* Align metric names in analysis config and documentation

* Add unified config utils/unified_config.yaml

* Add python script utils/split_config.py to auto generate analysis configuration and documentation metrics description
   * Add test case to ensure unified config is older than auto-generated config
   * Auto generate analysis config and documentation metrics description

* Update CONTRIBUTING.md to add instructions to build documentation assets
    * Add docker image and compose file to build documentation

* Update CHANGELOG and Documentation

* Use jinja template instead of hardcoding metric tables in documentation
2025-07-25 14:01:34 -04:00
Tony Gutierrez 5285c24657 rocr: Remove unused member of GPUAgent
The ape1_size_ member was leftover after the removal
of KV and is no longer used.

Remove it to remove some compiler warnings.

Signed-off-by: Tony Gutierrez <anthony.gutierrez@amd.com>
2025-07-25 10:43:28 -04:00
Vaddireddy, Sushma 51c5343bf8 Crash issue fix on MI100 (#160)
* Crash issue fix on MI100

---------

Co-authored-by: Sushma Vaddireddy <svaddire@amd.com>
2025-07-24 15:24:37 -07:00
Hui, Young 3954cedd25 [rocpd] Adding summary module to generate summaries from rocpd database + query submodule + rocpd command-line tools (#488)
* adding summary.py to generate tmp <category_region>_summary views

* migrating CSV summary to SDK method of writing CSVs

  - Add domain_view to summary.py
  - omit the C++ code of writing CSV because it gets revered later anyway

* Add summary subparser and write_sql_view_to_csv function

* adding all <>_summary views generation to summary.py

* add summary_per_rank feature

* add --summary-per-rank

* reconstruct generate_summary_view and create_domain_view

-introduce by_rank

* remove sqr and variance in summary views

* use RocpdImportData instead of connection

* two fixes on summary.py

--modify the generate_summary_view function to return a tuple with view name and sql code

add if_not_exits parameter to generete_summary_view

* Refactor summary.py to allow output path and filename args, and apply time_window
- clean up summary table column headers
- only generate by-rank views if that param is specified

* Add ProcessID to Hostname output and csv, so users can identify the system in the by-rank summaries

* Summary.py, just add hostname to by-rank summaries, instead of creating mapping table

* Summary - migrate csv writer to pandas, for more future flexibility

* Adding a few simple tests for summary.py

* Linting fixes

* add region_categories to summary options

  -  Automatically retrieve region categories from the database if argument is None

* add backticks for view_names

* fix tests after rebase

* Made code review changes
- fixed whitespace in CMakelists.txt
- adding query.py module & subparser in __main__.py
- refactor summary function to return query
- used query.py to output csv
- used query.py to also output summary to console
- provided new command line options to select summary output to csv or console

* Made fix to jinja template in query.py, as suggested by copilot

* Consolidated output calls to query in export_view function based on feedback
- refactored: helpers, query functions, create view functions
- extended formats to include what query supports (md, html, pdf, json)
- added json format to query, and changed orient=records
- adding jinja2 and reportlab to requirements.txt

* Add version_info for rocpd and roctx

* Add rocpd commandline tool

* Add executable permissions to source/bin/rocpd.py

* Removed rocpd2query, and cleaned up --help examples

---------

Co-authored-by: acanadas <acanadas@amd.com>
Co-authored-by: Jin Tao <jintao12@amd.com>
Co-authored-by: a-canadasruiz <Araceli.CanadasRuiz@amd.com>
Co-authored-by: Jonathan R. Madsen <Jonathan.Madsen@amd.com>
2025-07-24 16:12:06 -05:00
Madsen, Jonathan 735b5c3d4a [CMake] Fix thread trace sample ENVIRONMENT test property (#544)
Fix thread trace samples set tests properties
2025-07-24 15:23:37 -05:00
Baraldi, Giovanni cec481b4b1 Fix 32bit wrap to att buffer size (#176)
Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>

Everything but PSDB passed. I'm force merging because 1) this is a minor change and 2) PSDBs seem broken now, they are stuck and pointing to the wrong location: http://rocm-ci.amd.com/job/compute-psdb-staging-profiler-emu/435/
2025-07-24 19:44:32 +02:00
xuchen-amd 99a6e67bcc Improve --time-unit arg (#807) 2025-07-24 12:15:52 -04:00
vedithal-amd dbcaccb9de Fix rocprofv3 supported counters not being detected (#832)
* Fix rocprofv3 supported counters not being detected

* Fix rocprof interface deprecation warning appearing twice
2025-07-24 11:50:07 -04:00
vedithal-amd d4c316a730 Improve baseline comparison (#817)
* Do not force unsupported metrics to be specified in older gpu
  architectures as None

* Remove metrics which are explicitly set to None

* Update CHANGELOG

* Fix analysis configuration to fix baseline comparisons across all gpu
  architectures
    * Add missing 1812 section for gfx908
    * Add missing 1812 section for gfx90a

* Baseline comparision will only show common metrics
   * First workload will be used to set Metric ID index column
2025-07-24 11:49:02 -04:00
Honglei Huang 20806577ce rocr: support multiple driver types in agent initialization
Modify agent initialization to support different driver types,
to enable KFD_VIRTIO dirver for CPU and GPU agent here.

1. Add driver_type parameter to CpuAgent and GpuAgent constructors
2. Update topology discovery to handle multiple driver types
3. Fix MakeMemoryResident return value check in VirtioDriver
4. Add helper function IsGPUDriver to check driver types
5. Update agent discovery to iterate through all available drivers

This change makes the runtime more flexible by removing hardcoded KFD
driver assumptions and properly handling different driver backends.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-24 23:20:36 +08:00
Honglei Huang d36cb195da rocr/driver: add virtio driver support for ROCm runtime
This commit adds virtio driver support to the ROCm runtime by:

1. Implementing KfdVirtioDriver class that inherits from core::Driver
2. Adding KFD_VIRTIO to DriverType enum
3. Registering virtio driver discovery function in topology
4. Adding virtio driver source files to CMake build

The virtio driver implementation provides basic memory management and
queue operations for virtualized GPU environments. Some advanced features
like PC sampling and SMI are currently not supported.

Key changes:
- Add new files: amd_kfd_virtio_driver.h/cpp
- Update CMakeLists.txt to include virtio driver
- Add VIRTIO to DriverType enum in driver.h
- Register virtio driver in amd_topology.cpp

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-24 23:20:36 +08:00
Honglei Huang 48d3719dba libhsakmt/virtio: add virtio support for libhsakmt
This patch adds VirtIO support to the libhsakmt library, enabling communication
 with AMD GPUs via VirtIO.

Details
- CMakeLists.txt: Added a new CMakeLists.txt file for the VirtIO component
of libhsakmt.
- hsakmt_virtio.c/h: Implemented the core VirtIO functionality, including
VirtIO GPU device initialization, command execution, and memory management.
- virtio_gpu.c/h: Contains the implementation of the VirtIO GPU device,
including ioctl handling, shared memory management, and command execution.
- hsakmt_virtio_events.c: Implements event handling for VirtIO, such as event
creation, destruction, setting, resetting, and querying event states.
- hsakmt_virtio_memory.c: Manages memory operations for VirtIO, including memory
allocation, freeing, mapping, and unmapping.
- hsakmt_virtio_queues.c: Implements queue management for VirtIO, including
queue creation, destruction, and updating.
- hsakmt_virtio_topology.c: Handles system and node properties for VirtIO.
- hsakmt_virtio_vm.c: Manages VM-related operations for VirtIO, such as
reserving and dereserving VA space.
- include/linux/virtgpu_drm.h: Contains DRM definitions for VirtIO GPU.

Key Features
- VirtIO GPU Initialization: The library can now initialize a VirtIO GPU device
and communicate with it.
- Command Execution: Supports executing commands on the VirtIO GPU device.
- Memory Management: Provides functions for allocating, freeing, mapping, and
unmapping memory for VirtIO operations.
- Event Handling: Implements a comprehensive event system for VirtIO.
- Queue Management: Allows for creating, destroying, and updating queues
on the VirtIO GPU device.
- System and Node Properties: Retrieves and manages system and node
properties for VirtIO.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-24 23:20:36 +08:00
U, Srihari 3a36fd13fe [rocprofv3] rocpd doesn't generate output files for counter collection (#480)
* Fix kernel dispatch for counter collection

* Updated change log

* Fix format

* rename output csv file

* Fix warnings

* Address review comment

* Address final review comment
2025-07-24 12:11:36 +05:30
Galantsev, Dmitrii 8f3a232613 Profiler - Update counter definitions to match changed api
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-07-23 23:27:04 -05:00
ajanicijamd 4b4a846b58 Allow events to be grouped by HIP stream ID (#274)
- Corelate memory_copy and kernel_dispatch events with their HIP stream_id and add stream_id as an annotation in Perfetto.
- By default, group memory_copy and kernel_dispatch events in Perfetto output by their stream_id.
- Add option, with the configuration setting ROCPROFSYS_ROCM_GROUP_BY_QUEUE, to group by HSA queue instead.

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-07-23 21:28:26 -04:00
amd-hsivasun 61299d7598 Update formatting.yml for Rocprofiler-sdk (#7)
* Update formatting.yml

Changed runners to ubuntu-latest instead of AMD-ROCm-Internal-dev1

* Updated rocprofiler-sdk formatting workflow

* Added Sparse Checkouts

* Run in folder

* Check the WD

* List all files

* Added Rocprofiler-register

* Removed working dirs and rocprofiler-register

---------

Co-authored-by: Sivasuntharampillai, Haresh <Haresh.Sivasuntharampillai+amdeng@amd.com>
2025-07-23 18:35:36 -04:00
Fei Zheng 137f35e700 Fix L2 read/write/atomic bandwidths on MI350 (#831) 2025-07-23 15:46:19 -06:00
Kuricheti, Mythreya 2c7f260e62 [SDK] Fix context tracing domain bitset overflow (#536)
- Fix context tracing domain bitset overflow
- Previous behavior would enable all flags above ROCPROFILER_BUFFER_TRACING_MARKER_CORE_RANGE_API when this domain was enabled.
2025-07-23 15:52:52 -05:00
vedithal-amd a70ae40ddc Improve block filtering to accept metric ids (#821)
* Fix tests
* Update CHANGELOG and documentation
2025-07-23 16:16:29 -04:00
Sajina PK 67ec52b523 Fix to find MPI symbols from undefined symbols (#293)
* Fix to find MPI symbols from undefined symbols

* Moved condition checks before

* Fixing format

---------

Co-authored-by: Anuj Shukla <anujshuk@amd.com>
2025-07-23 16:02:05 -04:00
cfallows-amd 2a7bbc4cc2 Update standalone roofline intro (#830) 2025-07-23 15:17:00 -04:00
Jessey Harrymanoharan 27db3621df remove staging from branches 2025-07-23 14:15:21 -04:00
Jessey Harrymanoharan 655d975e51 Create rocm_ci_caller.yml 2025-07-23 14:05:44 -04:00
Shweta Khatri 6015ad1016 rocr: GFX12 - Enable host trap PC Sampling 2025-07-23 06:51:53 -04:00
Baraldi, Giovanni e898079a13 Thread trace and Trace Decoder API tests and samples (#416)
* Adding test and samples to decoder

* Fix sample

* Formatting

* Fix multi test

* Disable sample

* Fix tests

* Format

* Version fix

* Locking the decoder

* Add atomic

* Review comments

* Format

* Adding readme

* merge conflict and adding PCS+ATT test

* Review comments

* Properly disable PCS test

* Update tests/rocprofv3/advanced-thread-trace/CMakeLists.txt

* Adding back env var test

* Name fix

* Preload sample

* Addressing review comments

* Update docs

---------

Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>
2025-07-22 20:08:12 -05:00
Baraldi, Giovanni 0bb1a61e82 Adding high bits for ATT buffer size (#171)
* Adding high bits for ATT buffer size

* Copilot review comments

* Add buffer limits

* Update src/pm4/sqtt_builder.h

---------

Co-authored-by: Giovanni <gbaraldi@amd.com>
2025-07-22 18:53:55 -04:00
systems-assistant[bot] 53e20372c7 Add 'projects/roctracer/' from commit 'dd745ed9c731cf1c67a182a4ce41ce30afbfb8ca'
git-subtree-dir: projects/roctracer
git-subtree-mainline: d8cba83d42
git-subtree-split: dd745ed9c7
2025-07-22 22:52:51 +00:00